vBulletin Mods

The Official vBulletin Modifications Site
https://www.vbulletin.org/forum/showthread.php?t=127868

Sphinx Search
by orban
29 Sep 2006 21:05

2 Attachment(s)
Sphinx Implementation for vBulletin:

Version 0.1 Hooray!

Just sharing as usual, let the discussions begin (in b4 TECK "MINE IS BETTER")

Only tested with Sphinx-0.9.8-rc2 (r1234; Mar 29, 2008).

If you are upgrading from my old tutorial, backup your search.php (you know, just in case you need the old hacked up version again) and restore the original from the zip/tar, no more file modifications!

http://sphinxsearch.com/downloads.html

Tested on 3.6.10, should work on 3.7 if you modify /*insert query*/ on Line 522 (I removed 'prefixchoice' field because it doesn't exist in 3.6)

No support for tags/thread prefix yet, because I don't have access to a 3.7 installation at the moment

Similar threads is also being worked on

Alpha release for some feedback, hopefully it will be production ready soon :p

I assume you already have Sphinx up and running... see attached sphinx.conf.example for a minimalistic setup

Installation notes inside search_sphinx.php

Well yeah enjoy. And PM me if you need help

The old post is here: http://www.vbulletin.org/forum/showp...&postcount=387

The Good:
  • Search this forum
  • Search this thread
  • Find all posts by User
  • Find all threads started by User
  • "Search Entire Posts"/"Search Titles Only" and "Show Results as Threads"/"Show Results as Posts" in all four combinations supported
  • "Search Entire Posts" can be sorted by rank/post.dateline (postuserid, forumid will sort by integer)
  • "Search Titles Only" can be sorted by rank, last reply date, first post date, number of replies (views if you add that value to sphinx.conf)
  • Really fast

The Bad:
  • This means you can't sort posts by title, number of replies/views, thread start date, last reply date (Sphinx doesn't have this data).*
  • You could possibly add this to sphinx.conf but it will only be as good as your last full post index update
  • "Find Threads with At Least/Most X Replies" doesn't work when "Search Entire Posts"
  • Search results are delayed (depending on how often you run indexer)
  • "New Posts" not supported... too much logic in the query?!

The Ugly:
  • Sorting is kinda messed up (especially when "Search Entire Posts" and "Show Results as Threads" are combined)
  • search_sphinx.php is messy, duplicated code from search.php

*The Infamous Post Sorting Quirk

What happens here is that when you "Search Entire Posts" and "Show Results as Threads", do you want you threads sorted by:
  • First post dateline (vBulletin option)
  • Last post dateline (vBulletin default)
  • The matching post dateline (Sphinx)

Our Sphinx setup does not have first post and last post dateline stored in its post index (and it would be pretty much useless too) so the first two options are not available. vBulletin offers a function called "sort_search_items()" (search.php:633 3.7) which could, in theory, be used to sort the threads by last post dateline.

It does not fix the problem though. Let's assume we set maxresults to 5. We are searching for threads for "funny". We have 7 threads created today:

1. Thread "Cows", Created 08:00, Last Post 17:00 | "Funny Cows", Created 09:00
2. Thread "Cats", Created 09:00, Last Post 14:00 | "Funny Cats", Created 14:00
3. Thread "Dogs", Created 10:00, Last Post 12:00 | "Funny Dogs", Created 11:00
4. Thread "Mice", Created 11:00, Last Post 15:00 | "Funny Mice", Created 13:00
5. Thread "Rats", Created 12:00, Last Post 13:00 | "Funny Rats", Created 12:00
6. Thread "Eels", Created 13:00, Last Post 19:00 | "Funny Eels", Created 18:00
7. Thread "Fish", Created 14:00, Last Post 18:00 | "Funny Fish", Created 17:00

Do we want to show threads 6, 7, 2, 4, 5 (Sphinx)? Or do we want to show threads 6, 7, 1, 4, 2 (vB)?

vBulletin finds all 7 posts, orders them by last post descending, and grabs the top 5.
Sphinx will find the newest 5 matching posts and then returns you the associated threads.

Reordering search results with "sort_search_items()" does not fix the problem because there might be older threads with very recent replies that Sphinx won't even consider. Let's consider an 8th thread:

8. Thread "Bees", Created 2002, Last Post 20:00 | "Funny Bees", Created 2002

vBulletin will list this one on top, Sphinx will not consider it. So even re-sorting the search items will not make this thread appear.

Adrian Schneider 29 Sep 2006 21:30

Nice find! I'll play around with it once I get some time.

orban 29 Sep 2006 21:37

Obviously the only options you will have on the advanced search page are:

Key Words:
Search In: Thread Titles/Posts
Sort Results by: Relevancy, Date Asc, Date Desc
Search in Forums:

And I guess searching by username will still be the built in way. (As in, without a search term, just list his posts.)

Gonna try to hack that up, when I make it work I'll release it I hope :)

But the fact you can index 4k posts/second is absolutely insane, and that was with 800 users online... :D

Paul M 29 Sep 2006 21:39

Hmm, yes, that looks interesting, bookmarked for later. :)

orban 29 Sep 2006 21:50

Also means I can remove that 400mb fulltext index from post table making MySQL even faster.

The right tool for the job. :)

Filtering by forumid already works, so does sorting by date.

And it still says 0.000003 seconds. Incredible.

forumdude 29 Sep 2006 22:20

Hmm good timing. I got on here today to see if there were any other resources out there for searching and vbulletin and this showed up in the results.

We've had soooo much trouble keeping our search up. We're using the fulltext search right now with the search on its own server on tables reduced in size. Huge pain and it still doesn't return some results.

Keep us updated please, this looks cool.

forumdude 29 Sep 2006 23:36

Awsome!

If I get some time tonight (probably not!) I will download Sphinx and give it a look.

What kind of data do you have to test this with?

We're looking at about 9 million records on our live post table (millions more archived). I'm very curious how well this would hold up to that amount of data.

mute 30 Sep 2006 00:26

Can I get a peek at your sphinx.conf?

mute 30 Sep 2006 00:33

wow, you are fast! thanks. I'm tossing it 24 million posts to see what it does :)

mute 30 Sep 2006 01:28

*waits for post index to build*

So far so good. It ripped through 1,652,726 thread titles in about 2 minutes, on a machine replicating a very active forum, and one running a test upgrade from 3.5.5 to 3.6.1 :)

So far, I'm happy! I think with a little work this could be amazing. The api is a little unfriendly when it comes to errors and what not, but with some polishing and figuring out the targeting of searches and by name, and we're good to go.

Orban you are a hero among men!

Just FYI:

thread table:
collected 1658976 docs, 48.1 MB
sorted 5.1 Mhits, 100.0% done
total 1658976 docs, 48070959 bytes
total 148.426 sec, 323872.56 bytes/sec, 11177.16 docs/sec

post table:
collected 8860446 docs, 1416.9 MB
sorted 140.2 Mhits, 100.0% done
total 8860446 docs, 1416892676 bytes
total 3168.862 sec, 447129.84 bytes/sec, 2796.10 docs/sec

that is word length of 4 and no stopwords.

mute 30 Sep 2006 14:03

Quote:

Originally Posted by orban
Wow, that's crazy. 1.4gb for 8.8million posts....?!

Actually, 1.4gb for 24 million posts. For some reason it gets 1:1 "documents" when indexing thread, but only 1:3 for posts, not sure if that is a bug and it isn't indexing everything, or has something to do with our content?

I'm headed out fishing, but I'm going to play with your updated changes later :)

orban 30 Sep 2006 14:42

Weird....

mute 30 Sep 2006 14:55

Yeah, and I recreated it a few times (with stopwords, diff min word length, etc). Not exactly sure why yet.

orban 30 Sep 2006 20:02

Maybe some posts are too short? Like no words longer than 4 characters?

But then again that'd never be 2/3th of the posts. I really have no idea :(

kmike 02 Oct 2006 09:54

Sphinx 0.9.7 will feature an arbitrary number of group id's, so it would be possible to handle "search this thread" and search by user in Sphinx.
Meanwhile, it's easy to hack Sphinx to support 3 groupid columns instead of one by some copy-pasting. Naturally, the index size is larger with additional group id's, 5GB for 6mln post database. We've been running it for some months already with great success.

orban 02 Oct 2006 10:50

Mind sharing the patch and maybe your implementation in vB? Or at least outlining it?

Would be nice...!

mute 02 Oct 2006 17:32

orban, what kind of changes do i need to make to my search.php to have it search both the main and delta index?

Prior to setting up the delta index on my end, I noticed that I could search for words in post bodies and not return results, but if I look in my query.log I would see many many results.

orban 02 Oct 2006 17:38

mmmmm

you don't have to modify search.php, create a fake index that contains the two other indices.

mute 02 Oct 2006 17:40

hm ok, i think my config just might be a bit goofy. On my dev board, I created a new post after creating all 4 indexes. Anyway, my test post had a made up word in it, and after I posted I reran the delta updates, saw them pick up one doc, but I don't get any results returned if I use the "search" tool with sphinx.

I'm going to double check my config now.

orban 02 Oct 2006 18:07

weird....

make sure the indexes get created (check the data files)

ubuntu-geek 02 Oct 2006 18:59

Quote:

Originally Posted by orban
I know this is a bit ugly right now, but:

http://forums.mtgsalvation.com/search.php

Also the "Search This Forum" is using Sphinx now.

"Search This Thread" and all queries using userids have to be done the old way for the moment until the new Sphinx version is released.

But I'm happy users can search our 1.4 million posts in <1sec again. Without crashing the server, locking any tables or anything.

When new version is out I'll finish the implementation and release it :)

New version of sphinx or vb? ;) I really want to try this out.. searches are killing us.

orban 02 Oct 2006 19:03

sphinx

mute 02 Oct 2006 19:05

Hm. this is very strange. I have verified that my config is the same as yours (minus the names of the indexes), and have emptied my sphinx_counter table, nuked all my indexes, and rebuilt.

[[email protected] var]# /home/httpd/sphinx/bin/search -c /home/httpd/sphinx/etc/sphinx.conf purple
Sphinx 0.9.6
Copyright (c) 2001-2006, Andrew Aksyonoff

- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbpostidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec
- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbpostdeltaidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec
- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbthreadidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec
- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbthreaddeltaidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec

I broke something, but I don't know what :)

Ah, I found the problem I think.

For whatever reason, on my initial index, despite having used --rotate, it is leaving *new* index files in my var dir:

[[email protected] var]# ls -la *new*
-rw-r--r-- 1 root root 1356935444 Oct 2 13:39 vbpost.new.spd
-rw-r--r-- 1 root root 10644727 Oct 2 13:39 vbpost.new.spi
-rw-r--r-- 1 root root 54322284 Oct 2 13:42 vbthread.new.spd
-rw-r--r-- 1 root root 879893 Oct 2 13:42 vbthread.new.spi

Sphinx won't search against these, but I'm not sure why they didn't roll over.

orban 02 Oct 2006 19:30

Yeah I don't have .new. ones, just .old. ones.

Permissions?

mute 02 Oct 2006 20:07

It is what I believe a bug in sphinx. If you start searchd w/ no indexes preexisting, then index with --rotate, it won't rotate. The solution is to stop searchd, nuke everything, index, then start searchd.

It took me a while to figure it out, I'm not sure why it isn't smart enough to see that there aren't preexisting indexes when searchd tries to rotate.

orban 02 Oct 2006 20:11

Oh :(

Yeah I did the first index without searchd started.

Report it so it can be fixed :)

ubuntu-geek 02 Oct 2006 21:46

I was just curious is your search_sphinx.php posted a few threads back the most current one? or have you made more adjustments?

mute 02 Oct 2006 21:54

Just an FYI, make sure you limit access to that search on your dev boxes if you don't potentially want people searching for info in your private forums :)

I guess now we just get to wait patiently for 0.9.7 to come out...

orban 02 Oct 2006 21:59

What you mean? Other users with ssh access?

Yeah or kmike can share his patch http://www.vbulletin.org/forum/showp...2&postcount=21

:(

ubuntu-geek 02 Oct 2006 22:14

Quote:

Originally Posted by mute
Just an FYI, make sure you limit access to that search on your dev boxes if you don't potentially want people searching for info in your private forums :)

I guess now we just get to wait patiently for 0.9.7 to come out...

Not sure what you mean. It seems the permission system works on private/non private when doing searches with sphinx.

orban 02 Oct 2006 22:17

search can be called by anyone with server access on the command line

so he gets access to all your indexes and thus to all your posts

so if you have a designer ssh access to upload stuff he can basically read your private forums

ubuntu-geek 02 Oct 2006 23:27

Quote:

Originally Posted by orban
search can be called by anyone with server access on the command line

so he gets access to all your indexes and thus to all your posts

so if you have a designer ssh access to upload stuff he can basically read your private forums

True.. Not an issue for us..

orban 02 Oct 2006 23:29

Neither here, I'm the only with access.

mute 03 Oct 2006 01:35

Quote:

Originally Posted by mute
Just an FYI, make sure you limit access to that search on your dev boxes if you don't potentially want people searching for info in your private forums :)

I guess now we just get to wait patiently for 0.9.7 to come out...

Oh, I thought at this point the search wasn't excluding forums users don't have permissions to view :)

orban 03 Oct 2006 01:37

They aren't, but all posts/threads are filtered again on the results page.

kmike 03 Oct 2006 13:58

1 Attachment(s)
Attached is the patch for Sphinx 0.9.5 which adds two more group columns.
You'll have to have something like this in your sphinx.conf:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

The part with IF(post.userid=0) is needed because Sphinx doesn't like zero column values (you'll have them if a board has some posts by the guests or deleted users), so we replace them with an arbitrary high number (99999999) which is guaranteed not to happen in the real data.

sphinxapi.php supports two more grouping functions: SetGroup2(array) and SetGroup3(array).
So search.php will have to call $sphinx->SetGroups2($userids) when searching by user(s), where $userids is an array containing their userid's.
And similarly, $sphinx->SetGroups3(array($searchthreadid)) will be called when searching in a thread.

orban 03 Oct 2006 14:20

Thank you. Gonna try this out :)

ubuntu-geek 03 Oct 2006 15:10

Quote:

Originally Posted by orban
Thank you. Gonna try this out :)

Curious to see how this works out.. :)

TECK 03 Oct 2006 16:29

Thanks Orban (and others) for this solution.
0.9.6 is out, it fixes the following issues:
- added support for empty indexes (solves the previous issues we had with indexes)
- added support for multiple sql_query_pre/post/post_index
- fixed timestamp ranges filter in "match any" mode
- fixed configure issues with --without-mysql and --with-pgsql options
- fixed building on Solaris 9

orban 03 Oct 2006 16:32

Yes, but the patch for more than one group won't work for this...

I'm trying to get a snapshot of 0.9.7....

kmike 03 Oct 2006 17:00

Unfortunately, 0.9.7-dev is still too buggy to be used in production.

mute 03 Oct 2006 22:01

Quote:

Originally Posted by kmike
Unfortunately, 0.9.7-dev is still too buggy to be used in production.

What kind of bugs are you running into?

kmike 04 Oct 2006 07:39

Quote:

Originally Posted by orban
The groupid has to be <4096...?! I'm sure you have more than 4096 users...

Where did you get that number? We have much more than 4096 members and everything is working fine.
*edit* Ah, found it. You're mistaken - 4096 is the limit on a number of groupid's listed in one request. A groupid is an unsigned 32bit integer AFAIK, so the limit of 4GB should be enough for everybody (the famous last words)

Quote:

Originally Posted by mute
What kind of bugs are you running into?

Frequent crashes when searching.

TECK 05 Oct 2006 06:02

Go ahead and post it. :)
Thanks Orban.

mute 05 Oct 2006 06:08

Indeed, conf, patch and search would be fantastic :)

ubuntu-geek 05 Oct 2006 15:09

Cool, I'll give this a go this morning and see what happens..

Edit:
http://dragy.de/public/sphinx.api.diff the file is giving a 404 back :(

mute 05 Oct 2006 16:10

I'm getting a 404 on http://dragy.de/public/sphinx.api.diff, and am having some issues getting the src patch to apply, has anyone else managed to get it to apply?

orban, is there a reason you've removed the "Sort results by", "Find threads with", and "Find posts from" options from your search_forums template? They are still "doable" with multiple groups in sphinx, right?

Ideally I'm looking to replicate the existing vb search, minus the "find as posts and threads" option because I just think that is confusing.

mute 05 Oct 2006 23:42

I'm dumb, I didn't realize you didn't "make clean" prior to creating your diff, and didn't notice it was breaking on the lack of a Makefile as I was building off of a pristine src dir.

mute 05 Oct 2006 23:56

Yeah the patch is fine, if you've run configure before. I hadn't as I was using a fresh tarball so it won't apply cleanly. If you were to "make distclean" prior to generating your diff, it would apply cleanly for someone who had just untar'd the 0.9.6 source :)

I am rebuilding my indexes now, this is exciting! I think with date ranges this would probably be good enough to go live with!

orban 05 Oct 2006 23:59

Fixed the diffs now, yeah the configure was the problem. Sorry about this.

date ranges: I added them...(changed template search_forums, search.php and includes/sphinx.php, it's all edited in my howto post already)...I didn't realise this was built in because it's not used in api/test.php or "search". (It is though in sphinxapi.php).

Now I got a few users wanting the "Show as threads" "Show as posts" back, what did vB think when they added that >.<

I mean what does the search show when you are searching for posts and select "display as posts"? The first post in the thread?

And when searching in thread titles and choose "display as threads"? All threads the posts that are found are in?

The latter is impossible to run on large forums becuase let's say you get 150.000 posts back, then you'd have to sort 150.000 threadids...I think those were those queries I had in my slow log with hundreds of thousands threadids in them...that were killing the server....smart vB.

mute 06 Oct 2006 01:54

Quote:

Originally Posted by orban
Try to download the patches again....I'm really sorry about this but I never created patches before :(

diff -Naur sphinx-0.9.6/src sphinx-0.9.6-multigroup/src > /home/xxxxxxx/www/public/sphinx.src.diff
diff -Naur sphinx-0.9.6/api sphinx-0.9.6-multigroup/api > /home/xxxxxxx/www/public/sphinx.api.diff

This is what I used.

No need to be sorry! I got it to apply before you fixed it! :D I'm playing with it now. I appreciate you sharing your progress with the rest of us, it saves us a lot of headaches :)

gorman 06 Oct 2006 12:37

Could anybody create this as a standard plugin?

And... are others seeing the same extraordinary benefits?

orban 06 Oct 2006 12:43

It is not possible to make this a plugin unless they add a ton of hooks to search.php.

Not to speak of general *n*x knowledge you need to install this anyway.

Owwwww

I forgot a step

Copy the sphinxapi.php to..hmm..some folder. :)

kmike 06 Oct 2006 14:37

gorman: there is simply no comparison at all between MySQL embedded fulltext search and Sphinx-based search, both in terms of speed and relevance.

BTW, that's what I meant when I was replying to you at vb.com forums, about custom search solution.

ubuntu-geek 06 Oct 2006 14:38

Quote:

5. Uncomment "unset($datecut);" (-> "#unset($datecut);") so includes/sphinx.php can use it (for date range search).
Were exactly is this at?

ubuntu-geek 06 Oct 2006 14:52

Quote:

Originally Posted by orban
Doh, I mean comment it. So it doesn't get unset. >.<


line 12xx

In the section

// ############################################################################
// check if we are searching for posts from a specific time period

Before

// #############################################################################
// check to see if there are conditions attached to number of thread replies

perfect.. Was racking my brain on that one.. So far the implementation has been smooth. Only thing I am not keen on is the screen after a search that says please wait ;)

gorman 06 Oct 2006 14:53

Quote:

Originally Posted by kmike
gorman: there is simply no comparison at all between MySQL embedded fulltext search and Sphinx-based search, both in terms of speed and relevance.

BTW, that's what I meant when I was replying to you at vb.com forums, about custom search solution.

Cool. Thanks. And... at least you replied. I'm kind of annoyed that a recognized problem of this magnitude is being left "on its own" by the development team.

ubuntu-geek 06 Oct 2006 14:55

Quote:

Originally Posted by orban
That's always been there :O

$vbulletin->url = 'search.php?' . $vbulletin->session->vars['sessionurl'] . "searchid=$searchid";
eval(print_standard_redirect('search'));

Modify these lines and a add a straight header("Location:") maybe....

:0 I guess I am tired.. lol

gorman 06 Oct 2006 14:55

Quote:

Originally Posted by orban
It is not possible to make this a plugin unless they add a ton of hooks to search.php.

I'm mainly worried about upgrades... at the rate the vB team is churning them out, it could become a serious hassle to hand-modify templates each time.

ubuntu-geek 06 Oct 2006 14:59

Quote:

Originally Posted by gorman
I'm mainly worried about upgrades... at the rate the vB team is churning them out, it could become a serious hassle to hand-modify templates each time.

For me the speed increase is worth the few template edits..

ubuntu-geek 06 Oct 2006 15:14

Yeah I have yet to find a better forum solution. And at this point with Threads: 269,003, Posts: 1,588,154, Members: 175,576 I am not going to try and move it. We average like 4,000 users online at once during peaks and alot of that is search traffic.

So cheers orban for finding and sharing this search solution. I was about to implement a google search wrap in the forum template. (ghetto style)

orban 06 Oct 2006 15:17

I wish google offered a service to crawl your website in a closed environment for $ and then a search form. Like that google search appliance (?) but as an online service.

Oh well :)

ubuntu-geek 06 Oct 2006 15:24

Shrug yep.. Gee I knew people would complain about three letter searches... :)

orban 06 Oct 2006 15:26

Haha ;)

ubuntu-geek 06 Oct 2006 15:55

Any thoughts on adding the option to display results as threads/posts?

orban 06 Oct 2006 16:05

Well let's have a look:

Search posts and display as threads:

Let's say somebody searches for "book" and returns 150.000 posts. Those 150.000 posts are in 40.000 threads. If you find any way to fetch all 150.000 threadids, sort them and make a unique list of them, then let me know, but I really have no idea how to do that. I also think that this is a major problem of the vB search...(there are queries with several tens of thousands threadids in them).

Search threads and display as posts

I assume that "posts" mean "first posts in a thread"? You can probably add "firstpostid" as a new group for the thread index and then grab those...

Curse vB for adding those options :(

ubuntu-geek 06 Oct 2006 16:10

Quote:

Originally Posted by orban
Well let's have a look:

Search posts and display as threads:

Let's say somebody searches for "book" and returns 150.000 posts. Those 150.000 posts are in 40.000 threads. If you find any way to fetch all 150.000 threadids, sort them and make a unique list of them, then let me know, but I really have no idea how to do that. I also think that this is a major problem of the vB search...(there are queries with several tens of thousands threadids in them).

Search threads and display as posts

I assume that "posts" mean "first posts in a thread"? You can probably add "firstpostid" as a new group for the thread index and then grab those...

Curse vB for adding those options :(

Gotcha now I understand.. The users will just have to adjust.. Easy as that..

orban 06 Oct 2006 16:28

I don't understand anyway what exactly the problem is....

If you are searching in thread titles, then the search returns a list of threads.

If you are searching in posts, then the search returns a list of posts.

ubuntu-geek 06 Oct 2006 16:30

Quote:

Originally Posted by orban
I don't understand anyway what exactly the problem is....

If you are searching in thread titles, then the search returns a list of threads.

If you are searching in posts, then the search returns a list of posts.

Its all good :)

orban 06 Oct 2006 16:56

Quote:

Originally Posted by ubuntu-geek
Its all good :)

No I meant why vB implemented this behaviour. Not you're asking for it. My english ahah :confused:

ubuntu-geek 06 Oct 2006 17:10

Hmm got an interesting issue going on. When doing a search from forumdisplay i get this..

Warning: assert(): Assertion failed in /includes/sphinxapi.php on line 249
Query failed: searchd error: invalid or truncated request.

ubuntu-geek 06 Oct 2006 17:26

Looking at the html source of the forumdisplay it looks like its getting set..

<input type="hidden" name="forumchoice[]" value="73" />

ubuntu-geek 06 Oct 2006 17:32

Yeah.. exactly what I have.

orban 06 Oct 2006 17:36

Add a "if ($vbulletin->userinfo['userid'] == 1) echo $forumchoice;" somewhere....to check if the value gets set...dunno... :(

ubuntu-geek 06 Oct 2006 17:55

Yeah its getting set.. hrm... What version of php do you use?

137
Warning: assert(): Assertion failed in /includes/sphinxapi.php on line 249
Query failed: searchd error: invalid or truncated request.

ubuntu-geek 06 Oct 2006 19:36

Ok going to try this out now..

Ok that seemed to clean up the assertion issue. Last issue it seems is..

Query failed: searchd error: invalid group5 count 272485 (should be in 0..4096 range).

Hmm ok this seems to be related to how the groupid is counted in the searchd.cpp hrm..

mute 06 Oct 2006 20:30

Quote:

Originally Posted by ubuntu-geek
Hmm got an interesting issue going on. When doing a search from forumdisplay i get this..

Warning: assert(): Assertion failed in /includes/sphinxapi.php on line 249
Query failed: searchd error: invalid or truncated request.

I'm getting this too. My searches don't appear to be hitting searchd, I'm trying to debug it now as well.

Do only searches that HIT searchd get logged in query.log? My searches from the command line are working fine, but I can't seem to get them to hit searchd anymore via my test site.

Ok, i added that last bit of code but it doesn't seem to be fixed for me. Here's the output of a search targeted to a specific forum:

SphinxClient Object ( [_host] => db2 [_port] => 3312 [_offset] => 0 [_limit] => 250 [_mode] => 0 [_weights] => Array ( [0] => 100 [1] => 1 ) [_groups] => Array ( [0] => 394 ) [_groups2] => Array ( ) [_groups3] => Array ( ) [_groups4] => Array ( ) [_groups5] => Array ( ) [_sort] => 1 [_min_id] => 0 [_max_id] => 4294967295 [_min_ts] => 0 [_max_ts] => 4294967295 [_min_gid] => 0 [_max_gid] => 4294967295 [_error] => searchd error: invalid or truncated request [_warning] => ) Query failed: searchd error: invalid or truncated request.

ubuntu-geek 06 Oct 2006 20:41

Quote:

Originally Posted by mute
I'm getting this too. My searches don't appear to be hitting searchd, I'm trying to debug it now as well.

Do only searches that HIT searchd get logged in query.log? My searches from the command line are working fine, but I can't seem to get them to hit searchd anymore via my test site.

Ok, i added that last bit of code but it doesn't seem to be fixed for me. Here's the output of a search targeted to a specific forum:

SphinxClient Object ( [_host] => db2 [_port] => 3312 [_offset] => 0 [_limit] => 250 [_mode] => 0 [_weights] => Array ( [0] => 100 [1] => 1 ) [_groups] => Array ( [0] => 394 ) [_groups2] => Array ( ) [_groups3] => Array ( ) [_groups4] => Array ( ) [_groups5] => Array ( ) [_sort] => 1 [_min_id] => 0 [_max_id] => 4294967295 [_min_ts] => 0 [_max_ts] => 4294967295 [_min_gid] => 0 [_max_gid] => 4294967295 [_error] => searchd error: invalid or truncated request [_warning] => ) Query failed: searchd error: invalid or truncated request.

Same issue here..

mute 06 Oct 2006 20:42

Ok here's what is and isn't working for me:

1) Searching all open forums for keywords - Works
2) Searching in a specific forum by keyword - Does not work
3) Searching all open forums by username - Works
4) Searching in a specific forum by username - Works

ubuntu-geek 06 Oct 2006 20:46

Quote:

Originally Posted by mute
Ok here's what is and isn't working for me:

1) Searching all open forums for keywords - Works
2) Searching in a specific forum by username - Works
3) Searching all open forums by username - Works
4) Searching in a specific forum by keyword - Does not work

Exactly my issue.. what version of php do you run?

mute 06 Oct 2006 20:57

I'm using 5.1.5 at the moment. If I printout $forumchoice in sphinx.php, it is getting set, as well as making its way into the Sphinx request array.. I'm a bit puzzled.

ubuntu-geek 06 Oct 2006 21:32

Hmm nothing yet from me.. You making any progress?

mute 06 Oct 2006 21:41

nada

ubuntu-geek 07 Oct 2006 01:09

The one that puzzles me is..

Query failed: searchd error: invalid group5 count 271308 (should be in 0..4096 range).

ubuntu-geek 07 Oct 2006 01:29

No errors on the command line..

orban 07 Oct 2006 01:31

Try to use the sphinxapi.php from my tar.gz.....?

Download it again

I think I found the error...I fixed something in my sphinxapi.php and didn't copy it back to /api/

I'm so sorry :(

ubuntu-geek 07 Oct 2006 01:36

sphinxapi from the gz worked perfect... Orban you rock! Can i send you a donation for this effort? :)

ubuntu-geek 07 Oct 2006 01:38

<- has had a few beers already.. Yeah i'll send them a donation for sure! Will you be updating this when 0.9.7 is released?

orban 07 Oct 2006 01:40

Sure thing.

I'll also try to to make "show as posts" "show as threads" happen, but just right now I don't see how it's possible. But you never know what I come up with ;)

ubuntu-geek 07 Oct 2006 01:51

Right on.. :)

mute 07 Oct 2006 02:58

Yay! That fixed it for me too!

So, I'm thinking if you do plan on cleaning things up and releasing it at some point as a hack, that it would be best to gather up the "settings" into one file or at the top of the sphinx include. For example I have a multi server setup, so I specify the searchd server's ip rather than localhost, and I've renamed my indexes. To the average joe they might not notice or know how to make those changes to get things working. I'm going to do some more testing later on but things are looking very good :)

orban 07 Oct 2006 11:47

Yeah...to be honest I intend to do that when 0.9.7 comes out where more than one group is supported natively and things should be a lot cleaner (and prolly faster too).

I also hope I can figure out the show as posts and show as thread until then (tho I believe best would be to use subscriptions for that [a member told me he was searching for his posts + show as threads to track threads he posted in: SUBSCRIPTIONS]).

kmike 07 Oct 2006 13:46

Quote:

Originally Posted by orban
Search posts and display as threads:

Let's say somebody searches for "book" and returns 150.000 posts. Those 150.000 posts are in 40.000 threads. If you find any way to fetch all 150.000 threadids, sort them and make a unique list of them, then let me know, but I really have no idea how to do that. I also think that this is a major problem of the vB search...(there are queries with several tens of thousands threadids in them).

I assume you're storing threadid as a group attribute, to support searching within threads. So you'll get it back in the search results for every post found.
Just collect all threadid's in an array, throw out the duplicates using array_unique, and voila, you have your results as threads.
Quote:

Originally Posted by orban
Search threads and display as posts

There's no such option, looks like you mean "search titles only". But posts have titles too, you know?

orban 07 Oct 2006 14:07

Quote:

Originally Posted by kmike
I assume you're storing threadid as a group attribute, to support searching within threads. So you'll get it back in the search results for every post found.
Just collect all threadid's in an array, throw out the duplicates using array_unique, and voila, you have your results as threads.

Yeah, just when there's 120.000 posts found......you'd have to increase the limit in sphinx.conf to 200.000 or so, and loop through ALL of them, then throw out the uniques, and then sort by lastpost....!?

Quote:

Originally Posted by kmike
There's no such option, looks like you mean "search titles only". But posts have titles too, you know?

Yes there is...you can select "Search Titles Only" and then "Show as Posts"....

I think it returns the first post of all threads found....

ubuntu-geek 07 Oct 2006 14:31

Orban just curious, how often do you re-index the big index?

ubuntu-geek 07 Oct 2006 15:30

Right on, I'll give it a go. I do have a weird one though. If I do a search for just a username and leave everything else default it will pull only older threads nothing new. hrm...

orban 07 Oct 2006 15:32

If you don't enter any search terms the default vB search should be used....

A post title no one will read
by kmike
07 Oct 2006 21:17

Quote:

Originally Posted by orban
Yeah, just when there's 120.000 posts found......you'd have to increase the limit in sphinx.conf to 200.000 or so, and loop through ALL of them, then throw out the uniques, and then sort by lastpost....!?

What is your search results limit, is it really that high (120000)? I highly doubt it because your current search implementation would choke on that number, too, since the part of the script responsible for the search results display already goes through all the returned results.

So I guess you have more reasonable limit to the number of returned search results (around 1000?). At which point going through all of them suddenly doesn't look so bad.
Quote:

Originally Posted by orban
Yes there is...you can select "Search Titles Only" and then "Show as Posts"....

"Search titles only" combined with "show as posts" should search within the titles of the posts. They happen to be the same as the titles of the threads in the case of a first post in a thread (well, at least in most cases).
Now, the original vB search implementation (non-fulltext one) is following this logic. But vB fulltext implementation is throwing this concept away and searches within the titles of the threads, displaying only first posts in the threads found. I'll let you judge if this is correct or not.

Personally, I, too, think it's too confusing, but it's the legacy of the decision to allow each post to have its own title. Most of the members don't bother to type anything in a post title field when replying, and even if they do, it's completely inconspicuous in the default vB layout (and in the most vB layouts I've seen, for that matter).
But it's there, and it's there for good, so we should bear with it.

*edit*: cool, 100 posts! I'll let it sit there for some time ;-)

orban 07 Oct 2006 21:21

Quote:

Originally Posted by kmike
What is your search results limit, is it really that high (120000)? I highly doubt it because your current search implementation would choke on that number, too, since the part of the script responsible for the search results display already goes through all the returned results.

Well...

Let's assume you have

thread1 - 100 times "word"
thread2 - 50 times "word"
thread3 - 10 times "word"
thread4-50 5 times "word"

A search for "word" will return us 2500 posts. BUT there are only 50 different threads.

If your limit is 1000 (like mine) this will only return like 30 threads. So you're missing out 20......I'm actually seeing this on very common words (when searching post and "show as threads").

----------

1. Search Titles Only - Show as Threads = full text index on thread titles
2. Search Titles Only - Show as Posts = full text index on post titles
3. Search Entire Posts - Show as Threads = full text index on posts but grab threadids and display them, basically grouped by thread
4. Search Entire Posts - Show as Posts = full text index on posts

1., 3., 4. is working already. 2. is not (yet). I'll need to fix this then. (At the moment it's searching thread titles only and displaying the first post).

Also it's not weighting post titles/bodies yet (I think).

TECK 08 Oct 2006 09:32

Guys, when you compiled Sphinx, did you specified the mysql directory or you simple used --with-mysql?
Thanks.


All times are GMT. The time now is 14:17.

Powered by vBulletin® Version 3.8.14
Copyright © 2021, MH Sub I, LLC dba vBulletin. All Rights Reserved. vBulletin® is a registered trademark of MH Sub I, LLC
Copyright ©2001 - , vbulletin.org. All rights reserved.