vBulletin Mods

The Official vBulletin Modifications Site
https://www.vbulletin.org/forum/showthread.php?t=127868

mute 05 Nov 2006 17:29

Quote:

Originally Posted by mute
Our search is working flawlessly, and seeing ~4000 searches per day, which isn't too shabby at all.

Question for you guys, as I can't seem to find much in the way of documentation regarding the searches.

Does sphinx support "OR"? If you were to search for "test one two", it searches for all three with an implicit "AND". If you search for "blah not bleh", it will search for "blah -bleh". If you search for "test or task or mask", it will search for that literally (and likely ignore "or" as one of my stopwords)

Anyone? A couple of my more picky users are complaining, and I don't really have an answer for them, as I typically just do keyword searches.

So.. anyone smarter than I am have an answer to this? :)

orban 05 Nov 2006 17:33

Ask in the sphinx forum maybe....:O

mute 05 Nov 2006 17:37

I think that | is OR and & is AND, but I haven't tested it just yet. I hope they make it a tad more user friendly in the 0.9.7 release, my users aren't all that tech savy :)

ALanJay 05 Nov 2006 18:01

As Oban says there has been discussion on these kind of things on the sphinx forum http://www.sphinxsearch.com/forum/

mute 06 Nov 2006 04:28

Right, that's where I found it. I'll probably end up doing search and replace additions to my search page to replace natural language operators with their character representation.

ALanJay 06 Nov 2006 06:56

A quick question has anyone sucessfully configured there search to work with 2 letter words?

I have set my system for max length of 2 but I still only seem to be able to find 3 letter words.

amcd 06 Nov 2006 07:00

Quote:

Originally Posted by ALanJay
A quick question has anyone sucessfully configured there search to work with 2 letter words?

I have set my system for max length of 2 but I still only seem to be able to find 3 letter words.

first tell me how to configure that then i will tell u my results :)

mute 06 Nov 2006 15:46

Quote:

Originally Posted by amcd
first tell me how to configure that then i will tell u my results :)

min_word_len = 2

Rebuild your indexes :)

ALanJay 06 Nov 2006 15:52

I discovered that when you change the word lenght (or the stop bits file) you have to fully stop "searchd" and restart it for the changes to be taken into account.

Once searchd was restarted it behaved as expected.

Another query / thought. I have been so impressed by Sphinx that over the last couple of days we have impelemented a search of our non forum content using it. It works well. But I then thought it might be nice to create a simple search for the forums.

If anyone is interested I have some very basic code to do this - started with the code from test.php and once a valid result is found it does a search in the forum trhead database for the thread $docinfo[group2]

$article_query = "SELECT title, threadid FROM ???_forum.thread WHERE threadid='$docinfo[group2]'";

From there you can select the article trhead and create a simple output page.

If anyone in interest in more details shout and I might clean up the code so it can be looked at by all you pros :)

DaiTengu 08 Nov 2006 18:52

Quote:

Originally Posted by orban
It's weird though it doesn't even get set...should at least be an empty array. And why is it even a seperate global and not $vbulletin->coventry?

Just remove the line if you aren't using coventry then.

Such a mess ;)

I'm also having this problem, and the forum I'm using it on _is_ using Coventry. The results are returned & everything, but I'm assuming they'll search posts & return results of posts of users that are in Coventry if I turn this off.

Other than that, I'm completely amazed at how fast search is running :)

orban 08 Nov 2006 19:09

Are you guys using vB 3.6?

DaiTengu 08 Nov 2006 20:01

Quote:

Originally Posted by orban
Are you guys using vB 3.6?

It's a very clean 3.6.1 install on a very busy forum. There's no hacks or anything else installed.

amcd 09 Nov 2006 02:55

i dropped 3 fulltext indexes

title and title_2 on post
title on thread

hope i did the correct thing

what are the postindex and posthash tables for? i also have 3 tables called postindex_temp31480 and similar. what are those?

Neil Lock 12 Nov 2006 13:14

Would love a little bit of help, if available?

I have tried getting this to work on one of our test forums which currently has about 1.5mil posts and is running vbul 3.5.4 first of all im not sure whether my conf is correct when i run the indexer is it supposed to say

skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
those are the last 2 results - the dirstributed indexes is that error suppose to be there?

secondly when i run a search --config path qry it appears to work and give back results however upon turning on the searchd i dont seem to get any results the echos on my search.php
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'test' found 0 times in 0 documents

and there are no queries on my searchd.log

any starting point suggestions- i guess the searchd isnt being queried however when i switch it off the echo alerts me that searchd is not running? is one of the issues that im using 3.5?



Thanks
Nelly

orban 12 Nov 2006 13:20

Quote:

Originally Posted by Neil Lock
skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
those are the last 2 results - the dirstributed indexes is that error suppose to be there?

Yes

Quote:

Originally Posted by Neil Lock
secondly when i run a search --config path qry it appears to work and give back results however upon turning on the searchd i dont seem to get any results the echos on my search.php
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'test' found 0 times in 0 documents

and there are no queries on my searchd.log

any starting point suggestions- i guess the searchd isnt being queried however when i switch it off the echo alerts me that searchd is not running? is one of the issues that im using 3.5?

Are you starting the searchd with --config path too?

Neil Lock 12 Nov 2006 13:30

Wow, thanks for the really quick reply, yup am starting with the conf file

and the only lines in the log read:
[Sun Nov 12 14:29:38 2006] [24295] creating server socket on 0.0.0.0:3312
[Sun Nov 12 14:29:38 2006] [24295] accepting connections


Nelly

orban 12 Nov 2006 13:32

There's two log files.

searchd.log and query.log

What does query.log say?

Neil Lock 12 Nov 2006 13:35

ahh didnt notice that there was a query log

ok so it must be hitting it
[Sun Nov 12 14:34:13 2006] 0.009 sec [all/1/attr- 0 (0,500)] [vbpostindex] test qry

hmmmm will go back to the search.php - is it likely to be something to do with using 3.5.4 and not 3.6?

Nelly

mute 12 Nov 2006 13:36

Quote:

Originally Posted by Neil Lock
Would love a little bit of help, if available?

skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
those are the last 2 results - the dirstributed indexes is that error suppose to be there?

This is the correct behavior.

Quote:

Originally Posted by Neil Lock
secondly when i run a search --config path qry it appears to work and give back results however upon turning on the searchd i dont seem to get any results the echos on my search.php
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'test' found 0 times in 0 documents

Check your settings in sphinx.php, make sure your searchd command line has the proper --config, and.. upgrade to 3.6, as I don't believe you'll have much luck with Orbans search files w/ 3.5, as they were designed for 3.6.

Jeez, there were like 4 replies while I was replying :)

orban 12 Nov 2006 13:36

I don't really know... :(

you can try to work with the files in the /api/ folder of the archive you downloaded, and try to get that one work.

mute: !! :D

Neil Lock 12 Nov 2006 13:58

Thanx guys, will take a look at the api stuff but on initial inspection get error messages such as
Query failed: searchd error: index 'vbthreadindex': incompatible schemas: non-virtual attributes count mismatch: 4 in schema '/var/data/vbthreadindex', 5 in schema '/var/data/vbpost'.

if this means anything that can be relayed then please mention otherwise gonna spend the afternoon reading and playing...

as for 3.6 we hope to upgrade soon, but i havent ascertained what they have done to this upgrade (ie how much more 'beef' will our front ends need for this version!)


cheers

mute 12 Nov 2006 14:00

If i were you i'd verify that your sphinx.conf matches the example.

We upgraded to 3.6.2 about 2 weeks ago and it went pretty well, as far as our front end web traffic goes, I don't really notice a difference in terms of load :)

orban 12 Nov 2006 14:01

Are you combining wrong indexes together?

Neil Lock 12 Nov 2006 14:03

Quote:

Originally Posted by mute
If i were you i'd verify that your sphinx.conf matches the example.

We upgraded to 3.6.2 about 2 weeks ago and it went pretty well, as far as our front end web traffic goes, I don't really notice a difference in terms of load :)

well thats good news - the upgrade from 3 to 3.5 really hit us hard! we had our db box well tuned for 3 and then they up and move it to the front ends...v. annoying

ALanJay 12 Nov 2006 15:35

Neil,

The other thing to check is to use the test.php script to check that you can search the files correctly. If that works then you know you need to tweak the search.php code - it is possible as I have tweaked it all the way back to 3.0.x :) Though I doubt there will be much to change as the major changes occured in the the chnage from 3.0 to 3.5

Neil Lock 12 Nov 2006 15:42

Hey all, well i have been playing around with no real success with test.php which i can only assume means that my conf file is messed up somewhere, if anyone could take a few secs to see whether I have made any glaring errors


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.



my tables use the extension vb_ and i went through and modified all the table names as per instructs. anything glaringly obvious>?

Neil

amcd 13 Nov 2006 07:25

i have a problem


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

and the gibberish continues for another page or two



to reproduce this error, go to http://www.xboard.us/bbb/forumdisplay.php?f=7 and click on the 'search this forum' link. in the dropdown which opens, type 'barnacles' in the search term field and select 'show results as posts' and click go

happens with other forums and other search terms also, but not always. internet explorer and firefox show only a blank page. to see the error, use opera.

orban 13 Nov 2006 13:25

Can you try to use the test.php and/or the "search" command tool?

amcd 13 Nov 2006 13:41

search works 90% of the time, so its not a total failure

search command line tool gives the following output

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.


Re: Sphinx Search
by Neil Lock
15 Nov 2006 11:44

Finally got it to work,

Now I have another question, I want to run this from a slave database ie grab the query data but the sphinx requires the REPLACE INTO which obv cant run on a slave instance so my question is this - is it possible to hook this up to run on the master for the replaces and the slave for the other queries. I intend on going and playing but wondered before hand whether anyone had a solution?

Thanks

Neil

Re: Sphinx Search
by orban
15 Nov 2006 11:48

Why can't you have the counter table you run REPLACE INTO on the slave server?

Master: vB
Slave: Replicated vB + counter table + sphinx

Should work fine :O

Re: Sphinx Search
by amcd
15 Nov 2006 11:54

orban, any suggestions for my problem?

Re: Sphinx Search
by orban
15 Nov 2006 12:01

If you can reproduce the error with "search" I'd try the sphinx forums....or does it only happen when using sphinxapi.php?

Re: Sphinx Search
by Neil Lock
15 Nov 2006 12:31

The slave server as far as i know(and im pretty sure) has only read permissions hence the replace will have to be run on the master (which is then obv replicated across) so i cannot write to the slave db at all.

Neil

Re: Sphinx Search
by kmike
15 Nov 2006 14:46

Nothing can prevent you from creating a new table in the replicated db on the slave - granted your db user has the create/update privileges on that db. So your assertion that the slave is read-only isn't true.

(Actually nothing prevents you from messing with replicated tables on the slave, too - which obviously will break the replication integrity)

amcd 16 Nov 2006 09:53

Quote:

Originally Posted by orban (Post 1117852)
If you can reproduce the error with "search" I'd try the sphinx forums....or does it only happen when using sphinxapi.php?

as far as i can tell, it happens only when using the API


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

it does say that it got 11 results, so probably the error is not in the searchd portion

looks like i may have solved the problem by using the array_walk and intval approach as discussed by orban and alanjay. strange thing is that my forums are running vb 3.6.1, not 3.0.x for which the discussion was originally intended.

mute 20 Nov 2006 23:06

Hm. We're running into that "search results are out of order" bug again, even if a user has the sort set to by date, rather than relevancy.

Has anyone else using this run into it?

ALanJay 21 Nov 2006 07:29

Quote:

Originally Posted by amcd (Post 1118485)
looks like i may have solved the problem by using the array_walk and intval approach as discussed by orban and alanjay. strange thing is that my forums are running vb 3.6.1, not 3.0.x for which the discussion was originally intended.

I viewed it as a bit of a belt and braces issue :)

the values are numerals but sometimes they seem to be stored as strings no idea why but array_walk and intval seem to resolve the issue simply enough.

By the way for anyone interested I have now used Sphix to create a search interface to both our news and forums databases outside of vBulletin. Next step to implement the sort order (which jsut needs to be made pretty).

You can see the tool at http://www.digitalspy.co.uk/search/ds-search.php

DaiTengu 25 Nov 2006 03:19

Quote:

Originally Posted by kmike (Post 1117934)
Nothing can prevent you from creating a new table in the replicated db on the slave - granted your db user has the create/update privileges on that db. So your assertion that the slave is read-only isn't true.

(Actually nothing prevents you from messing with replicated tables on the slave, too - which obviously will break the replication integrity)

Sure there is, you can put a read-only option in my.cnf to prevent users from writing data to the database (except for the replication user & the root user)


Anyway, I am also running into the results out of order bug. I haven't changed anything, and it just seemed to start cropping up one day.

mute 25 Nov 2006 04:03

Yeah, my users are complaining about the out of order results, but I haven't had the time lately to delve into it. I swear at one point it was working, but now.. not so much.

ALanJay 25 Nov 2006 08:24

Quote:

Originally Posted by Neil Lock (Post 1117846)
Finally got it to work,

Now I have another question, I want to run this from a slave database ie grab the query data but the sphinx requires the REPLACE INTO which obv cant run on a slave instance so my question is this - is it possible to hook this up to run on the master for the replaces and the slave for the other queries. I intend on going and playing but wondered before hand whether anyone had a solution?

Thanks

Neil

Can I ask why?

searchd can be on any computer and the database it looks into can be on any other one (that it can see). You obvioulsy have to configure the front end to look at searchd on the correct computer and change the localhost references to the IP address of the machine that has searchd running on it.

But the load from indexing the files isn't that great and the way Oban has implemented it with a main index and deltas means that even with a large board with lots of posts and we get from 10,000 to 40,000 a day running the rebuild of the full index once a day at a quite preriod will not put a load on the database (and in our case with a file with nearly 12 million posts it takes under 5 minutes). The creation of the delta file which I run every 5 minutes takes just a few seconds.

Sphinx's overhead when indexing seems very small (as far as I can tell) on the mySQL database so I don't see the need to complicate things.

In my setup:

HTML front ends (x8 - 10.10.10.11 to 10.10.10.18) sphinx.php points to 10.10.10.19

10.10.10.19 - searchd when index created looks at 10.10.10.1

10.10.10.1 - mySQL master database

As I understand it all Sphinx leaves in the database is a marker to say where the line between the main and delta database is.

Good luck Neil :)

orban 25 Nov 2006 10:24

Quote:

Originally Posted by mute (Post 1124603)
Yeah, my users are complaining about the out of order results, but I haven't had the time lately to delve into it. I swear at one point it was working, but now.. not so much.

That's really weird :(

I never had this problem.

amcd 25 Nov 2006 10:32

Quote:

Originally Posted by orban (Post 1124725)
That's really weird :(

I never had this problem.

i have the same problem, though no one has complained yet

orban 25 Nov 2006 10:36

Can you reproduce this with "search" on the same input?

Might be worth asking in the sphinx forums.

DaiTengu 25 Nov 2006 10:39

For curiousity's sake, can I get rid of any of my indexes on the post table now? The table crashes periodically, and with the fulltext index it takes almost an hour to repair.

Neil Lock 25 Nov 2006 11:14

hey,

cheers for the help, probably against our server peoples wishes i indexed from the master database (our master db is fairly heavily loaded - we were trying to avoid adding anything new which may "tip it over") - so i now have an indexed db, the searchd daemon running now all i need to do is play around and write some scripts to manipulate the data the searchd returns - my first job is to build a standalone search which can be used to test before it goes live on our forums. I am actually quite excited about this product and looking forward to using it. Cheers guys. Will keep you posted on progress.

Neil

DaiTengu 01 Dec 2006 21:55

has anyone managed to fix the out-of-order results on their forum, yet?

kmike 02 Dec 2006 07:32

I'm taking a wild guess here, but maybe the returned results are sorted by relevance, that's why they seem out of order? (which order btw? date posted?)

DaiTengu 02 Dec 2006 08:02

Yeah, they're supposed to be sorted by date. Apparently I'm not the only person having the problem.

ALanJay 02 Dec 2006 08:14

Quote:

Originally Posted by DaiTengu (Post 1129864)
Yeah, they're supposed to be sorted by date. Apparently I'm not the only person having the problem.

When I wrote a search against oanother (non forum database) and I added sort by date I discovered that the place that the sort type is set matters.

ie

////////////
// do query
////////////
$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( array ( 100, 1 ) );
// Number of results to display //
$cl->SetLimits ( intval(0), intval($limit) );
// $cl->SetMatchMode ( $any ? SPH_MATCH_ANY : SPH_MATCH_ALL );
$cl->SetMatchMode ( $sp_srch );
$cl->SetSortMode ( $sp_sort );
$cl->SetGroups ( $groups );
$cl->SetGroups2 ( $groups2 );
$cl->SetGroups3 ( $groups3 );
$cl->SetGroups4 ( $groups4 );
$cl->SetGroups5 ( $groups5 );
$res = $cl->Query ( $q, $index );

Works for me byt putting the "SetSortMode" below Group 5 didn't work not sure why :) But it might be worth checking where it appears.

orban 02 Dec 2006 09:15

You realize that there is a difference between sorting in 0.9.6 and 0.9.7-RC1? They are NOT compatible.

mute 02 Dec 2006 23:25

Quote:

Originally Posted by DaiTengu (Post 1129864)
Yeah, they're supposed to be sorted by date. Apparently I'm not the only person having the problem.

You aren't alone. We too seem to get out of order results when searching, and not sorting by relevance.

mute 05 Dec 2006 01:46

Is there any hope of Sphinx handling "Find all posts by user" searches?

adalren 05 Dec 2006 06:31

I found a bug. When searching in title only, it doesn't honor datecut.

To fix, change:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

to:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Another thing I found is that vb caches the search results and it tries to find the closest or exact match. There's no problem with exact matches, but when it saves the results sorted in ascending date order the subsequent searches in reverse order shows only old threads. To fix this, just comment out the $highScore = 1 & highScore = 2 lines in search.php. This disables using the stale cache for non-exact matches.

Thanks for the great hack orban!

orban 15 Dec 2006 15:42

http://www.sphinxsearch.com/index.html

RC2 released, I'll upgrade tomorrow and see if there are any changes for us.

-----------

Recreated index, copied over new sphinxapi.php seems to work okay. There's a new "extended" search mode but don't think I'll use that (too complicated for users anyway).

mute 19 Dec 2006 18:56

Hm. So all was going well, but out of the blue our subforum searches stopped working. If i don't specify a subforum, the searches work. If I do, I get an assertion failure in sphinxapi.php @ line 290 (with 0.9.7-rc2).

I ran into the problem on -rc1, and decided to upgrade to see if it had been fixed, but it has not. I'm a tad stumped.

Edit: I seem to have fixed it by adding a "$value = intval($value);" before the assert() in sphinxapi.php. Guess this is related to the assertion failures earlier, so much for not having to cast variables :)

orban 19 Dec 2006 19:07

Tried to do an intval() on the forumid?

amcd 19 Dec 2006 19:21

@mute, good that you solved the problem, but i would not edit sphinxapi.php

since more people are facing the same problem, let me post an easy to follow solution. this is what i did to solve the problem on my forums, after reading the conversation between alanjay and orban

in includes/sphinx.php, around line 44, change

from:
Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

to
Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

around line 69, change

from
Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

to
Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

and lastly, around line 157, right at the end of the file, change

from
Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

to
Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.


mute 19 Dec 2006 19:41

Thanks amcd, that is indeed a better solution. I've made the changes and all appears to be working well :)

ubuntu-geek 02 Jan 2007 15:21

Just out of curiosity.. Are people using the standard vb search or full text in conjunction with their sphinx implementations?

orban 02 Jan 2007 15:35

full text so vB doesn't populate its search tables (at least me, I just noticed that this is actually not mentioned in the guide)

ubuntu-geek 02 Jan 2007 15:40

great thanks.. :)

Orban have you updated to sphinx rc2 yet?

amcd 03 Jan 2007 02:43

orban, i think you should clearly mention this in the tutorial. if vb is not switched to fulltext, there will be hardly any benefit from sphinx. also, the fulltext indices should be dropped, otherwise mysql will keep updating them and waste time.

mute 03 Jan 2007 20:28

So, is there any hope to having sphinx handle searching for all of a users posts? I forget what we determined earlier in the thread. Despite sphinx being fast, I'm still seeing slowdowns related to doing the "find all posts by user" searches :(

orban 03 Jan 2007 22:32

I don't think Shodan (from sphinx) has implement keyword-less queries yet but he plans to afaik.

kmike 04 Jan 2007 07:58

You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345.

ubuntu-geek 16 Jan 2007 13:51

I've noticed a few people talking about the sorting of searches being off. I am having the same issue, has anyone found a fix for it yet?

mute 16 Jan 2007 14:32

Quote:

Originally Posted by ubuntu-geek (Post 1160538)
I've noticed a few people talking about the sorting of searches being off. I am having the same issue, has anyone found a fix for it yet?

Not I. I can't say i've looked into it, but a magical fix would be awesome.

DigitalCrowd 18 Jan 2007 19:50

I think I figured out the problem, but not sure I know how to fix it at this very moment.

The sphinx.conf file that is being distributed in this thread builds the date_column as "dateline" for post index, and "lastpost" for thread index. I noticed the output from Sphinx is in proper order by dateline, but since the two indexes are not being given the same date source, then your threads will be out of order on search results when, I assume, that it groups the posts into the threads and the output displays the last post date of the thread and not the "post date" of the post that your search matched.

While you could fetch the lastpost date of a thread that is associated with the post and this way you use one date column unique across the indexes, my assumption is that unless you rebuild the full index (not deltas) constantly, that your searches will still be messed up.

We already know that when we say we want 1000 results and 7000 documents match, that we may only have 853 results, once all posts are grouped under specific threads.

I will have to think about this one.

Wait, hold on, things are coming to quickly...

Example:

You do a search on "Trees", and it finds 10 posts with the word "Trees" in it. For the sake of this discussion, "Trees" is only in one post in each thread. The results come back and order in DESC order, all the post's dateline.

Well, this is great, except, additional posts to those threads may have happened, and as such the search results are all out of order, because the search returns the dateline order, not the lastpost order of the associated thread.

This ALSO explains why when each of us first installed this and indexed our boards, that everything worked perfectly, because it was a brand new index. But, once you start building onto that index that is when things go astray.

I believe this is what is causing it all, but I might be missing something.

amcd 18 Jan 2007 20:13

DigitalCrowd, I think you have hit the nail right on the head. I rebuild my full index everyday, and my results are just slightly out of order. And the thread which are our of order are the ones which have been updated today, after the last rebuild.

So, how do we fix this problem? Once the search results have been received from sphinx, we then re-sort them by lastpost if the user has requested 'show results as threads'?

DigitalCrowd 18 Jan 2007 20:37

Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.

After further testing..

I rebuilt my index, search results in order. I did a reply to a post about the third the way down, not using the word "the" (my search word) in it. Now, when I do a search for "the", the third post down is out of order.

The ONLY way for a thread to get bumped up, is if the search word has been used again at a later time in that thread.

The Best way to do search, IMHO is to give more weight to recent threads, but get away from sort by date search results. Even Google doesn't offer this, but we are so accustomed to it in the forum world that getting people to break from it is hard. The best way would be to optimize how Sphinx (or the code that makes the call to Sphinx) weighs results.

I did notice on the Sphinx Forums that the sort mode "SPH_SORT_TIME_SEGMENTS" was made to address this and that if it doesn't work to our liking, it can be modified to perform better. I will have too look into it.

Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

kontrabass 25 Jan 2007 21:35

Boy, if ever a hack was ripe for a commercial opportunity... ;) A no-fuss, easy-to-install sphinx search for VBulletin with full search functionality and smart results ordering - how many big boarders would pay big money for this? (I would... it'd be a lot cheaper than buying another new server... :cool: ). I certainly HOPE this free development continues, but I look forward to the time when bugs like this aren't an issue.

stinger2 29 Jan 2007 19:26

Quote:

Originally Posted by kontrabass (Post 1167265)
Boy, if ever a hack was ripe for a commercial opportunity... ;) A no-fuss, easy-to-install sphinx search for VBulletin with full search functionality and smart results ordering - how many big boarders would pay big money for this? (I would... it'd be a lot cheaper than buying another new server... :cool: ). I certainly HOPE this free development continues, but I look forward to the time when bugs like this aren't an issue.

bookmarking this

Nerudo 29 Jan 2007 20:32

Itīs amazing. Iīm bookmarking too.

jason|xoxide 14 Feb 2007 18:34

I wouldn't classify this as a bug, just more of an oversight. The max_matches variable in the sphinx.conf file is ignored when using the PHP API script. It doesn't matter if it's left at the default 1000, the 1500 that orban's file has been modified to use, or 1000000 as the config file says not to do. If you want to change the number of results returned then you need to change line 15 of 'includes/sphinx.php'.

Old:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

New (replace '2500' with the number you want):

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Otherwise, great work! This has really sped up searching on the forums I have used for testing (6K posts, 820K posts, and 3.2M posts).

eoc_Jason 19 Feb 2007 19:22

Just installed on a forum with ~5.4 million posts... Before some searches would literally take forever (I had a query kill script that would kill the thread after a minute), now searches come back in less than a second! :)

Slow searches are a killer on a forum since it locks the post table. Also users seem to have a habit of clicking the search button multiple times if results aren't returned within a few seconds.

One thing, search results I've noticed don't always come back sorted by date properly. I skimmed over a few posts talking about this, I guess I need to go back and read it more in-depth. I'm sure the fix wouldn't be too hard.

Here's the data from the initial build, even with such a large index file it is still super fast.

Quote:

indexing index 'post_index'...
collected 5562411 docs, 1881.1 MB
sorted 189.9 Mhits, 100.0% done
total 5562411 docs, 1881073642 bytes
total 523.946 sec, 3590205.25 bytes/sec, 10616.38 docs/sec
indexing index 'post_index_delta'...
collected 75 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 75 docs, 18407 bytes
total 0.294 sec, 62659.52 bytes/sec, 255.31 docs/sec
indexing index 'thread_index'...
collected 357824 docs, 10.6 MB
sorted 1.2 Mhits, 100.0% done
total 357824 docs, 10635814 bytes
total 17.875 sec, 595012.62 bytes/sec, 20018.19 docs/sec
indexing index 'thread_index_delta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
skipping index 'fulltext_post_index' (distributed indexes can not be directly indexed)...
skipping index 'fulltext_thread_index' (distributed indexes can not be directly indexed)...

kontrabass 19 Feb 2007 21:58

Quote:

Originally Posted by eoc_Jason (Post 1186199)
Just installed on a forum with ~5.4 million posts... Before some searches would literally take forever (I had a query kill script that would kill the thread after a minute), now searches come back in less than a second! :)

Slow searches are a killer on a forum since it locks the post table. Also users seem to have a habit of clicking the search button multiple times if results aren't returned within a few seconds.

One thing, search results I've noticed don't always come back sorted by date properly. I skimmed over a few posts talking about this, I guess I need to go back and read it more in-depth. I'm sure the fix wouldn't be too hard.

Here's the data from the initial build, even with such a large index file it is still super fast.

Thanks for the report! Mind if I ask a couple questions (gathering info to see where I would stand): Are you running a slave DB server for searches? What kind of hardware is behind your database? Thanks :)

Mickie D 19 Feb 2007 22:39

this is fantastic well done all involved :)

im having a small issue i need some help with

1) if i issue the search command from ssh it gives me alot of results with words and if i do a search in the forums it gives me 1 or 2 results which i know there is alot more

2) i have a few words that i like included in the vboptions that are 3 letter words how do i enable them in sphinx without enabling all 3 letter words ?

any ideas ?

cheers

mute 20 Feb 2007 17:58

Quote:

Originally Posted by DigitalCrowd (Post 1162429)
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.

After further testing..

I rebuilt my index, search results in order. I did a reply to a post about the third the way down, not using the word "the" (my search word) in it. Now, when I do a search for "the", the third post down is out of order.

The ONLY way for a thread to get bumped up, is if the search word has been used again at a later time in that thread.

The Best way to do search, IMHO is to give more weight to recent threads, but get away from sort by date search results. Even Google doesn't offer this, but we are so accustomed to it in the forum world that getting people to break from it is hard. The best way would be to optimize how Sphinx (or the code that makes the call to Sphinx) weighs results.

I did notice on the Sphinx Forums that the sort mode "SPH_SORT_TIME_SEGMENTS" was made to address this and that if it doesn't work to our liking, it can be modified to perform better. I will have too look into it.

Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

Digital,

Have you managed to find a fix for this short of doing a full reindex? We're still just doing incremental updates, but I am still pretty annoyed about the "out of order" results.

kmike 20 Feb 2007 18:49

Quote:

Originally Posted by DigitalCrowd (Post 1162429)
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.
....
Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

Actually there is a function in vB which does exactly that (sorts the results by the specified field):
sort_search_items() in includes/functions_search.php
It's just one query, and it runs on a slave server if it's set up. Also its overhead depends only on the number of returned search results.
And you don't even need to run through it on every search, only when listing the results as threads, sorted by lastpost.

Mickie D 21 Feb 2007 21:58

anyone know why its not getting the full amount of results ?

if i search with command line its getting houndreds and if i search from the forums its finding 1 or 2 things which i know there is more of ?

Thank You, orban!
by raywjohnson
01 Mar 2007 21:38

Quote:

Originally Posted by orban (Post 1086294)
if you're interested in an implementation
http://www.vbulletin.org/forum/showpost.php?p=1104866

Thank you for the great instructions. A few hours work ( I am very new to vBulletin ), plenty of Google searching, and another successful installation of Sphinx! Forum "write-lock" issues have disappeared. Thanks again!

Forgot to ask, is there any kind of setting to block indexing before a certain date? One of my admins informed me that he cannot find any post prior to 2005.

Later, RayJ

raywjohnson 04 Mar 2007 02:58

Quote:

Originally Posted by jason|xoxide (Post 1182576)
I wouldn't classify this as a bug, just more of an oversight. The max_matches variable in the sphinx.conf file is ignored when using the PHP API script. It doesn't matter if it's left at the default 1000, the 1500 that orban's file has been modified to use, or 1000000 as the config file says not to do. If you want to change the number of results returned then you need to change line 15 of 'includes/sphinx.php'.

Old:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

New (replace '2500' with the number you want):

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Otherwise, great work! This has really sped up searching on the forums I have used for testing (6K posts, 820K posts, and 3.2M posts).

Little help!

Most likly I am just misunderstanding the results.
Here are the results from a command line search: (displaying matches: snipped)

[[email protected] ~/]# search test

Sphinx 0.9.7-RC2
Copyright (c) 2001-2006, Andrew Aksyonoff

index 'mypostidx': query 'test': returned 1000 matches of 191296 total in 0.029 sec

words:
1. 'test': 191296 documents, 475585 hits
index 'mypostidxdelta': query 'test': returned 56 matches of 56 total in 0.000 sec

words:
1. 'test': 56 documents, 134 hits
index 'mythreadidx': query 'test': returned 1000 matches of 2847 total in 0.154 sec

words:
1. 'test': 2847 documents, 2879 hits
index 'mythreadidxdelta': query 'test': returned 0 matches of 0 total in 0.000 sec

But, searching in the forum: (show posts/search entire post)
Search: Key Word(s): test Showing results 1 to 40 of 392

(I created a huge test forum and it has many more posts with the word "test" than 392)

And using test.php
php test.php test
Query failed: searchd error: index 'mythreadidx': incompatible schemas: non-virtual attributes count mismatch: 4 in schema '/var/sphinx/mythreadidx', 5 in schema '/var/sphinx/mypostidx'.

On the forum, the search never returns any more that 400 results (i.e. Showing: 40 of 400). I cannot find a setting that cuts off the results at 400.

I have read through this thread (twice!) and made suggested changes to sphinx.php [$cl->SetLimits()] and sphinx.conf (max_matches) with no change to the results.
(I also searches the "Common Forum" at sphinxsearch.com, no luck!)

Any insight into this would be most appreciated!

Later, RayJ

orban 04 Mar 2007 11:20

Have you restarted searchd?

UK Jimbo 04 Mar 2007 13:37

There were some problems with the assert function around post #280.

My solution to these was to turn assert warnings off using the single line of code:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

You can do this in the api file or in sphinx.php

I've just re-indexed with the post table at a min word length of 3. As you can see from the command line the process was niced at 20 and there were 400 active users on the site. I'm hugely impressed by this implemention.


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.


UK Jimbo 04 Mar 2007 23:18

2 Attachment(s)
Another post from me (might get auto-merged)...

I wanted to easily see the query.log that searchd creates, I always think it's good to try to give something back to a project that you like too :)
  • Place the attached file sphinx_search_log.php to your admincp directory, edit the reference to query.log if necessary
  • Place the attached file cpnav_sphinx_search_log.xml to your includes/xml directory

It's really that easy.

Now look in your AdminCP menu system under Statistics & Logs for Sphinx Search Log :)

raywjohnson 05 Mar 2007 21:45

Quote:

Originally Posted by orban (Post 1195392)
Have you restarted searchd?

Thanks for the help!

After I read your post I restarted searchd and preformed command line and forum search again, but the results were the same. I indexed -all as well and tried again, no joy!

Do you know of any settings in vBulletin that would limit the search results?

Later, RayJ

eoc_Jason 08 Mar 2007 18:15

Quote:

Originally Posted by kontrabass (Post 1186299)
Thanks for the report! Mind if I ask a couple questions (gathering info to see where I would stand): Are you running a slave DB server for searches? What kind of hardware is behind your database? Thanks :)

No, just one single server. Dual Woodcrest (4 cores total) & 4GB RAM. A slave server was an initial consideration, but even if there was a slave server the mysql fulltext searches would not have run any faster really. It would of only alieviated the locking on the master db.

Depending on what people searched before, queries could of taken several minutes (and we all know nobody waits that long for a web page to load). Queries like that would cause the post table to be locked and thus anyone trying to post would of also been sitting waiting until the search compelted. Usually people got impatient too and would click the search button several times, only queuing up the searches even more.

Until I installed this sphinx search mod, the only course of action was to have a custom script that would kill any search queries that took over 60 seconds (to prevent the issues above).

Yes everything on the server was extremely optimized, and I even had to set the mysql fulltext min characters to 5, and max to like 12-15 I think it was.

Basically the only two things out there is sphinx & mnogosearch mods for vB. I chose this one because it was the most transparent. Now searches are usually done in way under 1 second, and even though the results could sometimes be out of (date) order, it still works a million times better than before.

I plan on installing this on my own forum too, as searches are starting to cause issues.

Searching is one of the last weak points of vBulletin and really needs to be addressed.

Quote:

Originally Posted by raywjohnson (Post 1196587)
Thanks for the help!

After I read your post I restarted searchd and preformed command line and forum search again, but the results were the same. I indexed -all as well and tried again, no joy!

Do you know of any settings in vBulletin that would limit the search results?

Did you increase your max results in both the sphinx and the other php api file? Also I think vBulletin might have some limits in the control panel. I don't know which all are used for searching as I haven't had time to really look through all the underlying code with sphinx. I just got it up and running and have been letting it do it's own thing.

vB also does a lot of weighting and will toss out low results (irritating as it can produce no results even when there are). I do not know if this is still used with sphinx though, if it is then that probably explains your issue.

raywjohnson 09 Mar 2007 02:31

Quote:

Originally Posted by eoc_Jason (Post 1198934)

Did you increase your max results in both the sphinx and the other php api file? Also I think vBulletin might have some limits in the control panel. I don't know which all are used for searching as I haven't had time to really look through all the underlying code with sphinx. I just got it up and running and have been letting it do it's own thing.

vB also does a lot of weighting and will toss out low results (irritating as it can produce no results even when there are). I do not know if this is still used with sphinx though, if it is then that probably explains your issue.

I did make the changes to both files, still no change. But... your post promoted me to look deeper into a possible limit imposed by vBulletin.

Eureka! :D vBulletin Control Panel -> vBulletin Options -> Message Searching Options -> Maximum Search Results to Return (was set to 400 now set to 9000) Worked perfectly! :D

Thank you! And thanks to all who worked on helping get Sphinx Search working on vBulletin! Extra thanks to orban!

Later, RayJ

eoc_Jason 09 Mar 2007 17:19

Glad to hear you found it, I guess I should go back and check what I have it set to also.

Ah, post #301 was what I was referring to before. Anyhow, glad you figured out what it was. I guess all three of those settings need to be the same for the most optimal results.

This is a true must-have for any large forum, the mysql fulltext index search goes painfully slow after you exceed a certain number of posts, and the other vB search feature never worked all that wll for me.

This would really be the next big thing I would like to see vB integrate into new versions. They already support things like other datastore caches, why not other search engines?

Anyhow, I'm about to tackle another install, this time on my forum. Should go smoother than the first time now that I know all the ins & outs. The biggest thing is just making sure you rename everything properly in the config files.

Some Help needed
by Mb81
11 Mar 2007 15:08

I got some large forums and i try to use Sphinx now.

Problem 1.)

using config file '/usr/local/sphinx/etc/sphinx.conf'...
WARNING: index 'vbpost': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbpostindex': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbthreadindex': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbthreadindexdelta': failed to preload schema and docinfos - NOT SERVING
WARNING: index 'vbfulltext': no such local index 'vbpost' - NOT SERVING
WARNING: index 'vbfulltext': no such local index 'vbpostindex' - NOT SERVING
WARNING: index 'vbfulltext': no valid local/remote indexes in distributed index - NOT SERVING
WARNING: index 'vbfulltextthread': no such local index 'vbthreadindex' - NOT SERVING
WARNING: index 'vbfulltextthread': no such local index 'vbthreadindexdelta' - NOT SERVING
WARNING: index 'vbfulltextthread': no valid local/remote indexes in distributed index - NOT SERVING

The stuff was build, still i get this. What todo ?

Problem 2.)
Can it be used for multiple forums on the same server ?

eoc_Jason 12 Mar 2007 18:27

You probably have a configuration error, I would double-check the conf file and compare to the one in the post. One little mistake and the whole thing breaks (took me forever to find the one line I missed).

I don't see why you couldn't use it for multiple forums, just create more things in the config file with different names connecting to the different databases.

Mb81 12 Mar 2007 19:09

Here it is. I donīt see any mistake.
It would be really nice if someone could confirm it. Thanks alot.


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.


eoc_Jason 12 Mar 2007 19:21

Do you have sphinx as a separate DB? I just made it a table within my forum to keep everything consolidated (since the tables don't hold much data anyhow).

I would double check the sphinx table & field names. Like on mine they are 'sph_counter', not 'sphinx_counter', which is an inconsistency within the documentation and supplied example stuff. Other than that, the code looks okay to me.

Mb81 12 Mar 2007 19:31

Quote:

Originally Posted by eoc_Jason (Post 1201826)
Do you have sphinx as a separate DB? I just made it a table within my forum to keep everything consolidated (since the tables don't hold much data anyhow).

I would double check the sphinx table & field names. Like on mine they are 'sph_counter', not 'sphinx_counter', which is an inconsistency within the documentation and supplied example stuff. Other than that, the code looks okay to me.

I checked it; it seems all fine. I use sphinx.sphinx_counter; that shouldnt make a difference.

mute 13 Mar 2007 05:00

Did anyone else upgrade to php 5.2.1 and have their sphinx install break? I haven't had time to look into it yet, but mine fails to return results and I'm getting a:

Query '' retrieved -2114543231 of 1 matches in -2147483.222 sec.

Heh.

Hm. It is definitely something PHP 5.2.1 related. I went back to 5.2.0 and it is working just fine. I guess I'll have to look at the changelog in the morning to see if I can figure out what is wrong.

orban 13 Mar 2007 10:04

Yeah I had that, and recreated all my indices and restarted searchd and then it worked ;/ Really wierd tho.

mute 13 Mar 2007 16:07

Quote:

Originally Posted by orban (Post 1202345)
Yeah I had that, and recreated all my indices and restarted searchd and then it worked ;/ Really wierd tho.

I'm going to try it again, but I had already done that before I downgraded. I just got Andrew to send me a CVS snapshot so I'm going to try that as well, as I've found a segfault in the commandline client that is rather irritating as well.

Quote:

Originally Posted by mute (Post 1202572)
I'm going to try it again, but I had already done that before I downgraded. I just got Andrew to send me a CVS snapshot so I'm going to try that as well, as I've found a segfault in the commandline client that is rather irritating as well.

So I upgraded to the new sphinx CVS snapshot, stopped searchd, nuked all my indexes, rebuilt them all and tried to search with php 5.2.1, and it's still broken. php 5.2.0 works just fine, so there is something going on. I've read the changelogs and nothing really stood out so I'm stumped.

I made sure I upgraded my sphinxapi.php file when I upgraded too, and that didn't do it, so it is either that or something in your sphinx.php that is breaking, but I haven't been able to figure out what just yet.

Is anyone else running 5.2.1?

orban 13 Mar 2007 16:50

I'm running 5.2.1 :/

It didn't work but after recreating all indices and restarting searchd it suddenly did. I didn't have to change any other files.

mute 13 Mar 2007 17:23

Quote:

Originally Posted by orban (Post 1202605)
I'm running 5.2.1 :/

It didn't work but after recreating all indices and restarting searchd it suddenly did. I didn't have to change any other files.

Hmm. Boo. I've done that twice already, I wonder what else it could be? I have multiple webservers setup, and with the same exact settings, the 5.2.0 webservers work and the 5.2.1 webserver does not.

Update: I'm working with the Sphinx author on a fix. It's a 64-bit/PHP 5.2.1 + sphinxapi bug.


All times are GMT. The time now is 08:38.

Powered by vBulletin® Version 3.8.14
Copyright © 2022, MH Sub I, LLC dba vBulletin. All Rights Reserved. vBulletin® is a registered trademark of MH Sub I, LLC
Copyright ©2001 - , vbulletin.org. All rights reserved.