vBulletin Mods

The Official vBulletin Modifications Site
https://www.vbulletin.org/forum/showthread.php?t=127868

DigitalCrowd 18 Jan 2007 20:37

Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.

After further testing..

I rebuilt my index, search results in order. I did a reply to a post about the third the way down, not using the word "the" (my search word) in it. Now, when I do a search for "the", the third post down is out of order.

The ONLY way for a thread to get bumped up, is if the search word has been used again at a later time in that thread.

The Best way to do search, IMHO is to give more weight to recent threads, but get away from sort by date search results. Even Google doesn't offer this, but we are so accustomed to it in the forum world that getting people to break from it is hard. The best way would be to optimize how Sphinx (or the code that makes the call to Sphinx) weighs results.

I did notice on the Sphinx Forums that the sort mode "SPH_SORT_TIME_SEGMENTS" was made to address this and that if it doesn't work to our liking, it can be modified to perform better. I will have too look into it.

Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

kontrabass 25 Jan 2007 21:35

Boy, if ever a hack was ripe for a commercial opportunity... ;) A no-fuss, easy-to-install sphinx search for VBulletin with full search functionality and smart results ordering - how many big boarders would pay big money for this? (I would... it'd be a lot cheaper than buying another new server... :cool: ). I certainly HOPE this free development continues, but I look forward to the time when bugs like this aren't an issue.

stinger2 29 Jan 2007 19:26

Quote:

Originally Posted by kontrabass (Post 1167265)
Boy, if ever a hack was ripe for a commercial opportunity... ;) A no-fuss, easy-to-install sphinx search for VBulletin with full search functionality and smart results ordering - how many big boarders would pay big money for this? (I would... it'd be a lot cheaper than buying another new server... :cool: ). I certainly HOPE this free development continues, but I look forward to the time when bugs like this aren't an issue.

bookmarking this

Nerudo 29 Jan 2007 20:32

Its amazing. Im bookmarking too.

jason|xoxide 14 Feb 2007 18:34

I wouldn't classify this as a bug, just more of an oversight. The max_matches variable in the sphinx.conf file is ignored when using the PHP API script. It doesn't matter if it's left at the default 1000, the 1500 that orban's file has been modified to use, or 1000000 as the config file says not to do. If you want to change the number of results returned then you need to change line 15 of 'includes/sphinx.php'.

Old:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

New (replace '2500' with the number you want):

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Otherwise, great work! This has really sped up searching on the forums I have used for testing (6K posts, 820K posts, and 3.2M posts).

eoc_Jason 19 Feb 2007 19:22

Just installed on a forum with ~5.4 million posts... Before some searches would literally take forever (I had a query kill script that would kill the thread after a minute), now searches come back in less than a second! :)

Slow searches are a killer on a forum since it locks the post table. Also users seem to have a habit of clicking the search button multiple times if results aren't returned within a few seconds.

One thing, search results I've noticed don't always come back sorted by date properly. I skimmed over a few posts talking about this, I guess I need to go back and read it more in-depth. I'm sure the fix wouldn't be too hard.

Here's the data from the initial build, even with such a large index file it is still super fast.

Quote:

indexing index 'post_index'...
collected 5562411 docs, 1881.1 MB
sorted 189.9 Mhits, 100.0% done
total 5562411 docs, 1881073642 bytes
total 523.946 sec, 3590205.25 bytes/sec, 10616.38 docs/sec
indexing index 'post_index_delta'...
collected 75 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 75 docs, 18407 bytes
total 0.294 sec, 62659.52 bytes/sec, 255.31 docs/sec
indexing index 'thread_index'...
collected 357824 docs, 10.6 MB
sorted 1.2 Mhits, 100.0% done
total 357824 docs, 10635814 bytes
total 17.875 sec, 595012.62 bytes/sec, 20018.19 docs/sec
indexing index 'thread_index_delta'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.010 sec, 0.00 bytes/sec, 0.00 docs/sec
skipping index 'fulltext_post_index' (distributed indexes can not be directly indexed)...
skipping index 'fulltext_thread_index' (distributed indexes can not be directly indexed)...

kontrabass 19 Feb 2007 21:58

Quote:

Originally Posted by eoc_Jason (Post 1186199)
Just installed on a forum with ~5.4 million posts... Before some searches would literally take forever (I had a query kill script that would kill the thread after a minute), now searches come back in less than a second! :)

Slow searches are a killer on a forum since it locks the post table. Also users seem to have a habit of clicking the search button multiple times if results aren't returned within a few seconds.

One thing, search results I've noticed don't always come back sorted by date properly. I skimmed over a few posts talking about this, I guess I need to go back and read it more in-depth. I'm sure the fix wouldn't be too hard.

Here's the data from the initial build, even with such a large index file it is still super fast.

Thanks for the report! Mind if I ask a couple questions (gathering info to see where I would stand): Are you running a slave DB server for searches? What kind of hardware is behind your database? Thanks :)

Mickie D 19 Feb 2007 22:39

this is fantastic well done all involved :)

im having a small issue i need some help with

1) if i issue the search command from ssh it gives me alot of results with words and if i do a search in the forums it gives me 1 or 2 results which i know there is alot more

2) i have a few words that i like included in the vboptions that are 3 letter words how do i enable them in sphinx without enabling all 3 letter words ?

any ideas ?

cheers

mute 20 Feb 2007 17:58

Quote:

Originally Posted by DigitalCrowd (Post 1162429)
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.

After further testing..

I rebuilt my index, search results in order. I did a reply to a post about the third the way down, not using the word "the" (my search word) in it. Now, when I do a search for "the", the third post down is out of order.

The ONLY way for a thread to get bumped up, is if the search word has been used again at a later time in that thread.

The Best way to do search, IMHO is to give more weight to recent threads, but get away from sort by date search results. Even Google doesn't offer this, but we are so accustomed to it in the forum world that getting people to break from it is hard. The best way would be to optimize how Sphinx (or the code that makes the call to Sphinx) weighs results.

I did notice on the Sphinx Forums that the sort mode "SPH_SORT_TIME_SEGMENTS" was made to address this and that if it doesn't work to our liking, it can be modified to perform better. I will have too look into it.

Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

Digital,

Have you managed to find a fix for this short of doing a full reindex? We're still just doing incremental updates, but I am still pretty annoyed about the "out of order" results.

kmike 20 Feb 2007 18:49

Quote:

Originally Posted by DigitalCrowd (Post 1162429)
Well, then you get into having to fetch the current lastpost field of all matching threads and that would be, on larger boards, significant overhead. Now you move away from just searching Sphinx, to now search the database as well, then sorting your resulting arrays and pretty soon... you have a SLOW search again.
....
Without adding overhead to the search process, I think the days of instant lastpost sorting are gone, unless you can rebuild your full index every 15 minutes or so and for some people, that index process might last that long or longer.

Actually there is a function in vB which does exactly that (sorts the results by the specified field):
sort_search_items() in includes/functions_search.php
It's just one query, and it runs on a slave server if it's set up. Also its overhead depends only on the number of returned search results.
And you don't even need to run through it on every search, only when listing the results as threads, sorted by lastpost.

Mickie D 21 Feb 2007 21:58

anyone know why its not getting the full amount of results ?

if i search with command line its getting houndreds and if i search from the forums its finding 1 or 2 things which i know there is more of ?

Thank You, orban!
by raywjohnson
01 Mar 2007 21:38

Quote:

Originally Posted by orban (Post 1086294)
if you're interested in an implementation
http://www.vbulletin.org/forum/showpost.php?p=1104866

Thank you for the great instructions. A few hours work ( I am very new to vBulletin ), plenty of Google searching, and another successful installation of Sphinx! Forum "write-lock" issues have disappeared. Thanks again!

Forgot to ask, is there any kind of setting to block indexing before a certain date? One of my admins informed me that he cannot find any post prior to 2005.

Later, RayJ

raywjohnson 04 Mar 2007 02:58

Quote:

Originally Posted by jason|xoxide (Post 1182576)
I wouldn't classify this as a bug, just more of an oversight. The max_matches variable in the sphinx.conf file is ignored when using the PHP API script. It doesn't matter if it's left at the default 1000, the 1500 that orban's file has been modified to use, or 1000000 as the config file says not to do. If you want to change the number of results returned then you need to change line 15 of 'includes/sphinx.php'.

Old:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

New (replace '2500' with the number you want):

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Otherwise, great work! This has really sped up searching on the forums I have used for testing (6K posts, 820K posts, and 3.2M posts).

Little help!

Most likly I am just misunderstanding the results.
Here are the results from a command line search: (displaying matches: snipped)

[[email protected] ~/]# search test

Sphinx 0.9.7-RC2
Copyright (c) 2001-2006, Andrew Aksyonoff

index 'mypostidx': query 'test': returned 1000 matches of 191296 total in 0.029 sec

words:
1. 'test': 191296 documents, 475585 hits
index 'mypostidxdelta': query 'test': returned 56 matches of 56 total in 0.000 sec

words:
1. 'test': 56 documents, 134 hits
index 'mythreadidx': query 'test': returned 1000 matches of 2847 total in 0.154 sec

words:
1. 'test': 2847 documents, 2879 hits
index 'mythreadidxdelta': query 'test': returned 0 matches of 0 total in 0.000 sec

But, searching in the forum: (show posts/search entire post)
Search: Key Word(s): test Showing results 1 to 40 of 392

(I created a huge test forum and it has many more posts with the word "test" than 392)

And using test.php
php test.php test
Query failed: searchd error: index 'mythreadidx': incompatible schemas: non-virtual attributes count mismatch: 4 in schema '/var/sphinx/mythreadidx', 5 in schema '/var/sphinx/mypostidx'.

On the forum, the search never returns any more that 400 results (i.e. Showing: 40 of 400). I cannot find a setting that cuts off the results at 400.

I have read through this thread (twice!) and made suggested changes to sphinx.php [$cl->SetLimits()] and sphinx.conf (max_matches) with no change to the results.
(I also searches the "Common Forum" at sphinxsearch.com, no luck!)

Any insight into this would be most appreciated!

Later, RayJ

orban 04 Mar 2007 11:20

Have you restarted searchd?

UK Jimbo 04 Mar 2007 13:37

There were some problems with the assert function around post #280.

My solution to these was to turn assert warnings off using the single line of code:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

You can do this in the api file or in sphinx.php

I've just re-indexed with the post table at a min word length of 3. As you can see from the command line the process was niced at 20 and there were 400 active users on the site. I'm hugely impressed by this implemention.


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.



All times are GMT. The time now is 11:05.

Powered by vBulletin® Version 3.8.14
Copyright © 2022, MH Sub I, LLC dba vBulletin. All Rights Reserved. vBulletin® is a registered trademark of MH Sub I, LLC
Copyright ©2001 - , vbulletin.org. All rights reserved.