vBulletin Mods

The Official vBulletin Modifications Site
https://www.vbulletin.org/forum/showthread.php?t=127868

orban 08 Oct 2006 11:01

I didn't add anything, make sure you have the mysql-dev stuff installed. What error you getting?

TECK 08 Oct 2006 23:13

Not getting any errors, just wanted to make sure before I compile it.
I'll let you know if anything comes up.

ubuntu-geek 09 Oct 2006 15:02

Quote:

I also added a "$vbulletin->GPC['nocache'] = true;" to the search_process_start hook, I had some queries that stuck and I think that's because vB cached some queries and did some very bad re-sorting on those....try it out.
Could you give me a hint on this one :)

orban 09 Oct 2006 15:33

Meh I think it was because I deleted the full text indices and ran a MATCH (...) AGAINST query....and mysql kinda crashed...

Should be safe to enable it again.

ubuntu-geek 09 Oct 2006 15:39

gotcha ok

mute 09 Oct 2006 16:10

Quote:

Originally Posted by orban
I think I'll do it every 3 days but I don't know yet, we're not very busy right now so the delta indices are quite small.

--------------------------------------------------------------------------------------------------------

I fail to see why this works and I still think there's missing data in these results....

Updated sphinx.conf

Added "IF(firstpostid=0,99999999,firstpostid) as firstpostid" to fields list and "sql_group3_column = firstpostid"

you only need to rebuilt the thread indices.

http://dragy.de/public/sphinx.conf

Updated includes/sphinx.php

http://dragy.de/public/sphinx.php.txt

Update search_forums template

Readded the show as threads, show as posts options...

Rolled back navbar and FORUMDISPLAY templates....back to "show as thread" "show as posts"....

http://dragy.de/public/sphinx_search...s.template.txt

Updated search.php

Remove


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.




I also added a "$vbulletin->GPC['nocache'] = true;" to the search_process_start hook, I had some queries that stuck and I think that's because vB cached some queries and did some very bad re-sorting on those....try it out.

Can someone summarize what's going on here? I got sorta lost. Are you guys trying to figure out how to do the "view as posts, view as threads" options using sphinx, or making it so those options fall back on the vb search?

orban 09 Oct 2006 16:13

"Are you guys trying to figure out how to do the "view as posts, view as threads" options using sphinx"

Yes and it seems to work, too.

mute 09 Oct 2006 16:15

hm, I suppose I will give it a shot then!

Mine seems to be working as intended! Do you guys think the "$vbulletin->GPC['nocache'] = true;" bit in the search hook is needed? I love how this hack seems to be getting simpler as time goes on :)

TECK 09 Oct 2006 18:47

I just made a script, that will compile easier Sphinx.
It's for people who are not really comfortable with Unix.

1. Open you SSH utility and type vim installscript > Press Enter.

2. Press i (Insert).

3. Paste the following script:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

4. Press ESC.

5. Type :wq (Write Quit) > Press Enter.

6. Type chmod +x installscript > Press Enter.

7. Type ./installscript > Press Enter.

Wait for install completion and read the messages.
Post any wierd errors here. You are done. :)

mute 09 Oct 2006 18:52

I also made a diff against the hacked search.php for vBulletin 3.6.2. To apply, just "patch -p0 < sphinx_search_362.diff" in your src dir.

http://junglist.org/sphinx_search_362.diff

orban 09 Oct 2006 18:56

TECK, does that apply the multiple group patch?

Also gonna try to add basic sorting (date asc, date desc, relevance) later and fix the post title search.

TECK 09 Oct 2006 18:57

Orban and other guys, please feel free to edit the script, in order to include all extra patches needed for vBulletin.
Post here the edits and let us know.

Quote:

Originally Posted by orban
TECK, does that apply the multiple group patch?

Also gonna try to add basic sorting (date asc, date desc, relevance) later and fix the post title search.

Nope, just the basic install, with SQL validation... if for some reason the server does not find it by default. It will remove some wierd messages the Sphinx regular install might spit.
That's the reason I posted the script, so you can edit it and add the patches.
It's pretty stright forward, with the Unix commands, you can add them there, following the same patern.

I did not looked into patches, because I'm not familiar with them yet.
I was hoping you will take care of it and post the edits. :)
Also please explain more in detail what you did, others will understand better.

Be aware of those locations:
DST_DIR=${HOME}/dist
SPH_DIR=${HOME}/sphinx
SRC_DIR=${HOME}/source
SQL_DIR=/usr

Type ${HOME} to see what returns returns to your Unix prompt:
$ ${HOME}
bash: /home/user: is a directory

You still use mysqli for the forums and mysql for the search, right?
I have my forums set on mysqli.

ubuntu-geek 09 Oct 2006 19:50

Quote:

Originally Posted by orban
TECK, does that apply the multiple group patch?

Also gonna try to add basic sorting (date asc, date desc, relevance) later and fix the post title search.

Looking forward to those changes ;P

TECK 10 Oct 2006 02:38

Ya, I will work on it. :0
Pretty new at patching me also...
Question: in your config file, you don't have any table prefixes?
http://dragy.de/public/sphinx.conf

I'm probably missing something. Are you using a recent vB version, where it has table prefixes?
Thanks for clearing this up.

mute 10 Oct 2006 02:44

Quote:

Originally Posted by TECK
Ya, I will work on it. :0
Pretty new at patching me also...
Question: in your config file, you don't have any table prefixes?
http://dragy.de/public/sphinx.conf

I'm probably missing something. Are you using a recent vB version, where it has table prefixes?
Thanks for clearing this up.

Technically he doesn't need dbname.sphinx_counter either, just sphinx_counter would suffice.

TECK 10 Oct 2006 02:57

Editing the sphinx.conf file as we speak.

mtgsalvation is a new database where all sphinx tables were created, I believe?
Let me know why you did not created the sphinx table into the vBulletin database. Thanks.

This part:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Does not have any table prefixes???

mute 10 Oct 2006 03:13

We're not storing the sphinx data IN mysql, so only one table needs to be created, and that is the sphinx_counter table. mtgsalvation is the name of his vbulletin installation.

TECK 10 Oct 2006 04:14

Thanks mute... however, I'm not clear with the query above.
It does not make sense. The sql_query posted in his .conf file will not work, if the database tables have prefixes.
Please explain more in detail why you don't need table prefixes.

Also, from his .conf file:
sql_db = mtgsforums

That's why I'm confused...

orban 10 Oct 2006 09:27

Okay, in my example:

mtgsforums: my vbulletin database
mtgsalvation: the database with the counter table in it

I do NOT have table prefixes. You have to add those.

TECK 10 Oct 2006 11:48

That makes a lot of sense... I was expecting this answer from my previous post above, Orban.
Thanks for the explanation. :)

mute 10 Oct 2006 17:19

OOC, why'd you put the counter table in a different database than your vb forum, just standard practice?

orban 10 Oct 2006 17:22

Yes, and I intend to probably have more sphinx indices for other things in future. It has nothing to do with vB so I want it seperate.

mute 12 Oct 2006 17:27

Hm, I've got your latest changes running, but sorting results by date doesn't seem to be working, I get the same results if i choose relevancy or by date.

orban 12 Oct 2006 17:41

Sorry I once again forgot to update the downloadable includes/sphinx.php file again >.< Try now. Relevancy asc/desc doesn't work, it's always descending (highest relevant at top, obviously).

mute 12 Oct 2006 17:56

I just wget'd the sphinx.php (to make sure i wasn't caching), edited it to my liking, and my search results when selecting "Show as threads", "one month ago", "search all forums", and "sort by last post, descending" aren't sorted in any sort of method I can figure out, am I missing something?

orban 12 Oct 2006 18:05

It's working fine for me :(

Are you sure the search isn't getting cached?

mute 12 Oct 2006 18:23

This is why I shouldn't be working on an empty stomach! It is working as intended :)

Brains 12 Oct 2006 20:06

HOE LEE SHIYYYTEE!!!! This absolutely rocks. I was a little skeptical, but I went ahead and built the "worst case" index with Sphinx (no stopwords, 4.7M posts, min word length of 1) and tried some typically VERY difficult searches (from the command line). WOW... This sucker is unbelievably fast...

Time to stitch it into my forums. This is amazing, GREAT find, and THANK YOU for sharing!

ubuntu-geek 12 Oct 2006 21:02

updated and working awesome!

ALanJay 13 Oct 2006 17:30

Quote:

Originally Posted by mute
Ah, I found the problem I think.

For whatever reason, on my initial index, despite having used --rotate, it is leaving *new* index files in my var dir:

[[email protected] var]# ls -la *new*
-rw-r--r-- 1 root root 1356935444 Oct 2 13:39 vbpost.new.spd
-rw-r--r-- 1 root root 10644727 Oct 2 13:39 vbpost.new.spi
-rw-r--r-- 1 root root 54322284 Oct 2 13:42 vbthread.new.spd
-rw-r--r-- 1 root root 879893 Oct 2 13:42 vbthread.new.spi

Sphinx won't search against these, but I'm not sure why they didn't roll over.

I found the same thing but just renamed the files without the ".new" and all was fine. :)

The sugestion to nuke everything and start again without searchd running didn't seem to work either.

This is a case where the first time you create stuff there is an issue but after that it all works fine - when I update the DELTA files there doesn't seem to be an issue.

orban 13 Oct 2006 17:32

Yeah maybe --rotate doesn't work when you create them first time.

ALanJay 13 Oct 2006 22:00

Curiously when I added anither database to the config file that sphinx database also refused to rotate just creating .new files looks like some kind of bug.

On another track has anyone tried accessing the sphinx searchd from another host?

I tried using the php api and at first test it refused to connect from a remote host but works when on the same machine (but referencing an IP address rather than localhost).

orban 13 Oct 2006 22:05

Firewall?

That rotate seems to be bugged mm...yeah..just if you have a new config file entry just create that one alone...without --rotate, first time.

ALanJay 13 Oct 2006 22:23

Quote:

Originally Posted by orban
Firewall?

Turned out that as well as a place holder in sphinxapi.php the "localhost" was also hard coded into the test.php code provided :)

Quote:

Originally Posted by orban
That rotate seems to be bugged mm...yeah..just if you have a new config file entry just create that one alone...without --rotate, first time.

Yes it is very odd. With the new database that I have added if I don't use --rotate it overwrites the current file but if I use --rotate it creates .new files (even subsequently). The other files are still in place and rotate correctly when I install the DELTA files on the main forum databases every 5 minutes.

So as you say a bit confising but other than that pretty impressive :)

An update to this - I relised that maye the problem is that searchd needs to be fuly restarted to re-read the config file before it knows about the new files and allows them to be rotated.

Starting a new data set in Sphinx seems to require:

1) creating without the --rotate flag
2) Stopping searchd completly (kill `cat /var/log/searchd.pid`)
3) restarting searchd with the updated config file sphinx.conf

After those changes things seem to once again work. :)

Hi another update / query

Well having managed to get sphinx up and running and the test.php element searching the data we thought we would try the next steps.

Unfortunately we are still using 3.0.x and the search.php has changed a huge amount :(

I don't suppose anyone has tried adding sphinx search to 3.0.x?

Looking at the changes suggested I can find c1 and c2 - though the variable names have changed along with c4 and c5.

Obviously with the variable name changes oban your very useful sphinx.php will need various changes to the variables.

But if anyone has tried this with 3.0.x please let me know :)

ALanJay 15 Oct 2006 22:06

Hi another update / comment / query :)

We seem to have managed to get things working with 3.0.x but when testing see:

Warning: assert() [function.assert]: Assertion failed in /includes/sphinxapi.php on line 209

This doesn't seem to be fatal in any way and the search function works any ideas what this is trying to achieve :)

Overall thanks to Oban for the code to make this all work it seems to do an excellent job.

orban 16 Oct 2006 07:20

What's on line 209?

ALanJay 16 Oct 2006 09:17

1 Attachment(s)
Hi,

Well further to the above doing various test searches which all seem to produce the correct results I have discovered a couple more of these anomalies :)

when I set various search options - ie user or forums or date as well as text search I get these errors in sphinxapi.php there are various assertion test ie

line 209:

assert ( is_int($limit) );

line 234

/// set groups
function SetGroups ( $groups )
{
assert ( is_array($groups) );
foreach ( $groups as $group )
assert ( is_int($group) );

$this->_groups = $groups;
}

It looks like the defaults set in sphinx.php line 75

$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( $sphinx_weights );
$cl->SetLimits ( 0, $vboptions['maxresults'] );
$cl->SetMatchMode ( SPH_MATCH_ALL );
$cl->SetGroups ( $sphinx_groups );
$cl->SetGroups2 ( $sphinx_groups2 );
$cl->SetGroups3 ( $sphinx_groups3 );
$cl->SetGroups4 ( $sphinx_groups4 );
$cl->SetGroups5 ( $sphinx_groups5 );
$cl->SetSortMode ( $sphinx_sort );

And before this some times for some searches they are set to strings line 52

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

This doesn't seem to effect the results but the assertion fails when the elements are not integers.

In the case of line 209 (sphinxapi.php) and line 75 (sphinx.php) these can be made to be (forced) to integers as they are obviously numbers ie

$cl->SetLimits ( intval(0), intval($vboptions['maxresults']) );

But I am not certain about the other elements and options which because the defaults are text strings don't work in the same way.

Anyway hope that helps.

By the way if anyone wants the recipe for using Sphinx with 3.0.X then let me know and I can remove my specific defaults and post it here. The biggest change is the recoding from OOP to the old style referencing of variables. But there always seem to be ones that meet the same requirements.

The only other things to spot are the changes to the search.php from the Vb code which follows the examples that Oban gave but obviously in slightly different locations in search.php ie

Make change c1 at around line 304
Make change c2 at around line 331
Make change c3 at around line 1210
Make change c4 at around line 1414
Make change c5 at around line 1147

sphinx.php see the diff file attached.

Once again very cool work Oban and we should also thank the Andrew Aksyonoff over at www.sphinxsearch.com

ALanJay 16 Oct 2006 12:57

Hi,

Having done more research the warning errors can be switched off by adding:

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

At the top of the sphinxapi.php script. It might be better to stop the the reasons the warnings are being created but at least it gives on the option to see or not to see them.

Another curiosity is on our forum the searches all seem lighting quick EXCEPT when you look for exclusively the "thread started by user" this can take over a minute to give back a result.

If you add additional requests - limit the date / thread content / forums to search the time it takes is reduced.

Finally when searches are processed every one of them you see the redirect page to show that it is being processed when you do a "thread started by user" search you don't see that label so not sure if my changes to search.php have caused this anyone any ideas or does the same happen on 3.6.x?

Regards
ALan

mute 16 Oct 2006 22:28

orban, did you ever fix the searching of post titles? I thought I was running your latest code, but it seems to be broken on my devel install.

orban 16 Oct 2006 22:32

It also searches them in my latest version (you can set relevancy in the sphinx.php) but it doesn't quite work like the default vB search (yet).

ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.

ALanJay 17 Oct 2006 08:40

Quote:

Originally Posted by orban
ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.

OK - does anyone else get "assert" warnings?

What I have done is set the warning messages off

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

in sphinxapi.php

As far as I can see the assert errors are generated because the asserts all check to see if things are integers and some of the input defaults are either text strings numerics or text strings.

These warnings don't seem to effect the output which seems to work pretty well. But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))

So this warning implies that one of the items is the wrong datatype - checking back through the code on line 34 and 50 these are set to:

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

Is this the issue? Should they be numeric?

For anyone interested this is now live at:

www.digitalspy.co.uk/forums/

We have 11,158,584 Posts and 464,239 Threads. And the main data file is a little over 4Gb in size.

It is still a work in progress but it does seem to produce the correct results :)

Quote:

Originally Posted by ALanJay
But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))

After much thought we realised that we don't use the $Coventry feature and I suspect that is the reason it does not work. As I'm not sure what $Conventry should resolve to I have removed from my implementation the whole line. It seems to say if not moderator and sent to Coventry then don't do search and as we have no people in the secodn category removing it seems to be the best short term solution.

I'm not sure if this is an issue between 3.0.x and 3.5/3.6 but thought I would share my thoughts on this as it kept me on my toes and I now have a much better understanding of the way the code works :)


PS the docinfo[$spinx????] elemets turn the group defaults into numerical output as required. I'm still not sure why the assert errors are being seen though will delve deeper :)

PPS Well after more searching and playing I am no further forward as to why the assert warning errors are occuring. Trying to force the elements to be integers with intval breaks the code :) so I am now with a system that seems to work but generates warning errors that I have switched off. I assume no one using 3.6 is having these issues with these assert warnings?

orban 17 Oct 2006 10:58

Why does intval() break any code?

And maybe the $Coventry variable is something else in vB 3.0...

I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0 :(

ALanJay 17 Oct 2006 12:07

Quote:

Originally Posted by orban
Why does intval() break any code?

That is a very good question. I suspect I am not using it 100% correctly but in the simplest example line 32

$sphinx_groups2 = $sphinx_userids;

to

$sphinx_groups2 = intval($sphinx_userids);

Seemed to cause odd behaviour.

I was also seeing if using it in:

if (!empty($userids)) $sphinx_userids = explode(',', $userids);
else $sphinx_userids = array();
if ($forumchoice != '') $sphinx_groups = explode(',', $forumchoice);

But wasn't sure I could use it in this context.

My problem is that not entirely understanding the logic of what is going on here (but learning as I go along). I'm not sure why I am seeing the "Warnings" yet they generate perfect results.

Depending on the results each of the elements "SetGroups" "SetGroups2" SetGroups3" generate these warning errors but because these are arrays I need to build the array with integers and I assume not numerics that are text(?)

Quote:

Originally Posted by orban
And maybe the $Coventry variable is something else in vB 3.0...

It is possible - from talking to my system admin it allows you to not allow users to do certain things. After thinking about this I don't think it is an issue as we don't use it. So for me removing it solves the problem that the second element of the if statement that checking if the user has been sent to Coventry isn't nescessarry.

Quote:

Originally Posted by orban
I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0 :(

No problem without your code we wouldn't have been able to do this at all. So thanks so much.

I assume you don't see any of the assertion errors in vB 3.6 ?

Anyway as you can see (if you register on our site) the Sphinx search does work and very smoothly and quickly and great solution to off looading the search function out of the main database.

One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

Swamper 21 Oct 2006 09:51

Quote:

Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

Why not have that specific search just redirect to the standard vB search.php? It's fast.

----

Found my way here via the Big Boards Thread on vB.com - wow - I'm going to get on this right away! :D We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks and for over a year now we've survived only because our search was split up into separate tables according to date range - updated nightly - and stored on another drive, but with 'Search this Thread', 'View New Posts' and 'Find all posts by User' acting on the live post table.

kmike 23 Oct 2006 06:48

Quote:

Originally Posted by Swamper
We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks

Be warned that vB 3.6 is much more CPU demanding than vB2 (and even vB3), so you'd better beef up your web frontend(s) before the final switch.

Quote:

Originally Posted by orban
Let's assume you have

thread1 - 100 times "word"
thread2 - 50 times "word"
thread3 - 10 times "word"
thread4-50 5 times "word"

A search for "word" will return us 2500 posts. BUT there are only 50 different threads.

If your limit is 1000 (like mine) this will only return like 30 threads. So you're missing out 20......I'm actually seeing this on very common words (when searching post and "show as threads").

Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.

orban 23 Oct 2006 09:49

Quote:

Originally Posted by kmike
Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.

Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.

ALanJay 23 Oct 2006 15:40

Quote:

Originally Posted by orban
It's because they are both arrays, or a string of comma seperated numbers?

You'd have to use array_walk, lemme know if you need help.

I everntually worked this out but never managed to get it to work sucessfully I assume something in difference between the way 3.0.x and 3.6 handles these casues a problem. Because it is only a warning I have left it - maybe next time there is an opportunity to play I will have another go with array_walk if I can fathom the syntax to get everything switched from numerals as text to integers.

Quote:

Originally Posted by orban
With or without key words?

If it's without it's using the default search and I can't really help with that.

Without which I now understand why it is slow and we have removed it from our choiced 1 minute to bring back the answer was a little long.


Overall it has been running now for a week and once we sorted a few things out it has been excellent and using your cool current and DELTA index the databases are updated every 15 minutes and the whole site reindexed every night.

Thanks for the ideas this has been an excellent tool and remarkably easy to implement.

orban 23 Oct 2006 15:56

function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

Glad to hear it works for you!

ALanJay 23 Oct 2006 16:08

Quote:

Originally Posted by orban
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

I will have a play - thanks.

Quote:

Originally Posted by orban
Glad to hear it works for you!

Seems to :) Of the various hacks and atempts to solve the text search issue this one seems to have delivered on its goals. There are still a few things I don't understand and which would probably improve performance but overall it works well.

Maybe you have some ideas on the issues:

morphology = none
stopwords =
min_word_len = 3
charset_type = sbcs
}

What do morphology and stopwords do / offer and how to best use them.

and

mem_limit = 256M
}

mem_limit for creating the index anyone have any views as to sensible optimum answer for this we are running this on a machine with 8Gb of RAM and as it started as 32M I didn't want to make it too big but it still complains it could be better :)


==============

Looking in the original configuration file I think I have a handle on the morphology, word_len and char set.

Would I be right in saying that the stopwords file is a list of words NOT to index?

If so does anyone have a good list of 2 and 3 letter words that can happily be removed from an index :)

==============

Looking on the sphinxsearch forums there is discussion on creating stop words and the indexer can produce list of most used words for you to work with ie

/usr/local/bin/indexer --config sphinx.conf --rotate --buildstops sphinx-stop.txt 1000 --buildfreqs

This builds a file with the most commonly used words in the index and the frequencythat they are in your index.

If I understand this correctly it should allow you to remove a few of the obvious things.

Quote:

Originally Posted by orban
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

Hi,

Looking at the code in sphinx.php:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Where do you put the array_walk manipulation?

As far as I can tell one needs the results of the various items above to be so processed.

or do you implement it:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

ie

$cl->SetGroups4 ( (array_walk( $sphinx_groups4, "intvalArray") );

I assume it doesn't matter that sometimes the array will be one element long.

=====================================

Quote:

Originally Posted by orban

Quote:

Originally Posted by alanjay
Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?



With or without key words?

If it's without it's using the default search and I can't really help with that.


Quote:

Originally Posted by Swamper
Why not have that specific search just redirect to the standard vB search.php? It's fast.


Searches without keywords already are redirected to the default search.

Just curious "orban" having done some more checks when doing just a user "Find Threads Started by user" it is over a minute with the size of files we have - and from what you are saying this is the standard vB result. While once you add an addional key - search string it all works much faster as it is using Sphinx (is that right?).

Is there a reason you didn't code that using Sphinx?

kmike 24 Oct 2006 10:16

Quote:

Originally Posted by orban
Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.

Well, it's their own fault then ;-)

orban 24 Oct 2006 22:49

Well if it's crashing the server it's not :<

Quote:

Just curious "orban" having done some more checks when doing just a user "Find Threads Started by user" it is over a minute with the size of files we have - and from what you are saying this is the standard vB result. While once you add an addional key - search string it all works much faster as it is using Sphinx (is that right?).

Is there a reason you didn't code that using Sphinx?
Yeah but you'd have to add a fake string to all posts...mm....that's what you do right?

ALanJay 25 Oct 2006 07:29

Quote:

Originally Posted by orban
Yeah but you'd have to add a fake string to all posts...mm....that's what you do right?

If only it was that easy :)

If you enter a space it thows it away and does the standard search if you enter a single character it just says there are no such matches. If you have a longer string it only finds occurences that match not all all of them :)

mute 25 Oct 2006 18:26

We're rolling out our sphinx search when we upgrade our site to 3.6.2 on thursday. I'm hopeful that it will live up to my testing, but I am a tad worried that the "find posts by user" searches will be a bit pokey. I'm don't think it warrants a lot of concern given how often that particular type of search is actually done though..

Orban, have you been working on any other surprises lately? :)

orban 25 Oct 2006 20:10

You mean "find posts by user" without key words yeah?

No, haven't had a lot time lately. ;(

ALanJay 25 Oct 2006 21:22

Quote:

Originally Posted by mute
We're rolling out our sphinx search when we upgrade our site to 3.6.2 on thursday. I'm hopeful that it will live up to my testing, but I am a tad worried that the "find posts by user" searches will be a bit pokey. I'm don't think it warrants a lot of concern given how often that particular type of search is actually done though..

Just to make clear that "find posts by user" is fine and works very fast it is "Find THREADS STARTED by user" that is solw (still uses the internal code in vB - I think from the comments made).

Overall we have been using sphinx search now for nearly a week and it seems to work very nicely, the DELTA file is generated every 5 mintes with the full file being rebuilt each night in the early hours of the morning. Our vB main data file is around 4Gb in size (Threads: 467,561, Posts: 11,271,241, Members: 173,321).

We have disabled the "Find Threads STARTED by user" option for non admin users from the search options as a temporary measure - I'm not sure anyone used it in anycase.

Obviously in an ideal world it would be nice if this was searchable but it isn't a deal breaker.

kmike 26 Oct 2006 05:49

AlanJay, what's your MySQL version? I remember dealing with a bug in MySQL 4.0.x when "find threads started by user" query didn't use the proper indexes. Repairing the thread and post tables fixed that, but after some time the problem crept back in.

Update: found the same bug in the bookmarks, it appears not only 4.0.x are affected: http://www.vbulletin.com/forum/bugs....iew&bugid=4159
Looks like an intermittent index loss or corruption.

Also, FYI sphinx-0.9.7-rc1 has been released.
Another update: forgot to say that the crash bug has been fixed in RC1.

ALanJay 26 Oct 2006 07:22

Quote:

Originally Posted by kmike
AlanJay, what's your MySQL version? I remember dealing with a bug in MySQL 4.0.x when "find threads started by user" query didn't use the proper indexes. Repairing the thread and post tables fixed that, but after some time the problem crept back in.

We are using 4.1 version of mySQL (clients 4.1.18 to 4.1.21 and server 4.1.19).

Though I suppose the real goal is to remove this from the standard vB search and put it into a search done by sphinx (if possible) as with tables as big as ours I can understand why the results might take a little time to be returned.

Quote:

Originally Posted by kmike
Update: found the same bug in the bookmarks, it appears not only 4.0.x are affected: http://www.vbulletin.com/forum/bugs....iew&bugid=4159
Looks like an intermittent index loss or corruption.

Thanks will take a look.

Quote:

Originally Posted by kmike
Also, FYI sphinx-0.9.7-rc1 has been released.
Another update: forgot to say that the crash bug has been fixed in RC1.

Interesting from discussions it sounds like there is quite a number of changes so I assume there might be more to upgrading than a simple rebuild once it is properly released.

orban 26 Oct 2006 10:35

Yeah, I just saw 0.9.7-RC1 got released.

I think I will wait for one more release candidate or even for the final version, because I'm sure there'll be bugs. Once it's release I'll create a new how-to. It will probably not just be a simple rebuild of indices, yeah.

ALanJay 26 Oct 2006 11:59

Quote:

Originally Posted by orban
Yeah, I just saw 0.9.7-RC1 got released.

I think I will wait for one more release candidate or even for the final version, because I'm sure there'll be bugs.

:) - as the current version works so well I'm in no hurry.

Quote:

Originally Posted by orban
Once it's release I'll create a new how-to. It will probably not just be a simple rebuild of indices, yeah.

I suspected as much. Good luck when he gets that far.

By the way did you have a sugestion as to where was the best place to do the arraywalk in your sphinx.php code?

Can it be implemented in the lines like?


ie something like


$cl->SetGroups2 ( array_walk($sphinx_groups2, "intvalArray") );

Obvioulsy with the function elsewhere in the code.

function intvalArray(&$item, $key)
{
$item = intval($item);
}

When I try I get another error:

Invalid argument supplied for foreach()

in Sphinxapi.php in the places that the int_val check takes place.

kmike 26 Oct 2006 12:22

Well, as the creator of the multi group column support for 0.9.5 I can honestly say that it was a terrible copy/paste hack. 0.9.7 has them implemented properly, and the resulting index takes much less space which is always good from the I/O standpoint.

orban 26 Oct 2006 12:25

Quote:

Originally Posted by ALanJay
By the way did you have a sugestion as to where was the best place to do the arraywalk in your sphinx.php code?

Can it be implemented in the lines like?


ie something like


$cl->SetGroups2 ( array_walk($sphinx_groups2, "intvalArray") );

Obvioulsy with the function elsewhere in the code.

function intvalArray(&$item, $key)
{
$item = intval($item);
}

Yeah that's alright.

Quote:

Originally Posted by kmike
Well, as the creator of the multi group column support for 0.9.5 I can honestly say that it was a terrible copy/paste hack. 0.9.7 has them implemented properly, and the resulting index takes much less space which is always good from the I/O standpoint.

Hm okay. I'll have a look then. Thanks for telling me :)

ALanJay 26 Oct 2006 13:16

Well having tried that:

$cl->SetGroups2 ( array_walk($sphinx_groups2, "intvalArray") );


I end up with errors from sphinxapi.php

Invalid argument supplied for foreach()

in the code that look in the array for the values to be checked as integers. I'll have to do some more playing when I have some time. :(

orban 26 Oct 2006 13:26

Oh I'm sorry.

array_walk doesn't return the new array.

Do
array_walk($sphinx_groups2, "intvalArray");
$cl->SetGroups2 ( $sphinx_groups2 );


---

Also at the moment rewriting sphinx.php for 0.9.7-RC1.

In sphinx.conf just minimal changes were necessary, sphinx.php quite some changes. Currently recreating indices so I can start playing :)

---

Running 0.9.7-RC1. Minimal changes to sphinx.conf, a huge change to sphinx.php and it's running :) Will upload upgrade howto and full howto later! Going to the gym now, need to get strong! "Strong Mind, Strong Body"!?

ALanJay 26 Oct 2006 13:55

Quote:

Originally Posted by orban
Oh I'm sorry.

array_walk doesn't return the new array.

Do
array_walk($sphinx_groups2, "intvalArray");
$cl->SetGroups2 ( $sphinx_groups2 );

Thanks - that seems to work. Curiously I am now getting assertion errors further down in the date element:

/// set timestamps to match
function SetTimestampRange ( $min, $max )
{
assert ( is_int($min) );
assert ( is_int($max) );
assert ( $min<=$max );
$this->_min_ts = $min;
$this->_max_ts = $max;
}

Which is most odd. Looks like in 3.0.x everything is held as text.

It appears adding:

$datecut = intval($datecut);

Just before datecut is first looked at seems to sort that out.

As I assume intval doesn't do anything harmful so it can be used generically.

Thanks oban for the guidence.

orban 26 Oct 2006 13:57

Yeah that's weird. Just call

SetTimestampRange

with SetTimestampRange ( intval ( ... ), intval ( ... ) );

ALanJay 26 Oct 2006 14:23

Quote:

Originally Posted by orban
Yeah that's weird. Just call

SetTimestampRange

with SetTimestampRange ( intval ( ... ), intval ( ... ) );


Thanks - I ended up just forcing datecut with intval (see above) and leaving it like that as the other elements in the various palces SetTimestampRange are set don't seem to be causing an issue.

mute 26 Oct 2006 18:35

Orban, lookin good. I have mine up and running just in time for our upgrade tonight :)

One thing i had to change though -- I limit my search results to 500, and on line 101 of sphinxapi.php there is a

$this->_maxmatches = 1000;

This will throw an error if you try to request 1000 results via php from a searchd that is limited to less than 1000. I changed my sphinxapi to match and its all good :)

Just an FYI!

orban 26 Oct 2006 18:40

Cool, glad to hear it works :) I honestly didn't do much testing but the few things I tried seemed to work and it's just a few different function calls. And if something it wrong I sure get a notice here quickish :D

mute 26 Oct 2006 19:13

Edit: Nm! I forgot that i had sphinx pointed at a vb 3.5.2 database, not a 3.6.2 :)

orban 26 Oct 2006 19:22

Well, dateline is supposed to be the sql_date_column? I thought that was the case, but I guess it's not. Looks like all groups even the date one have the sql field as their name.

Edit: OOooo! This means we can also implement sort by userid and forumid!? (And all other sorting options but they will have a quite big delay...like if you update only every 24 hours they will be 24 hours old).

mute 26 Oct 2006 19:25

Eh, I'm not sure it is needed. Since my main site isn't 3.6 yet (tonight is the upgrade), I had my sphinx looking at my live data, and my vb install looking at a test db (with older data).

The dateline column on thread was added in 3.6, which is why I think it was broken. I'll know more tonight when we upgrade :)

Were you running into the dateline error on thread title?

orban 26 Oct 2006 19:30

I thought "dateline" was the internal name for whatever field you have defined as the sql_date_column. But it looks like that isn't the case. I wonder why there even still is an sql_date_column. Because the new SetSortMode() can take ANY column. This confused me a bit.

mute 26 Oct 2006 19:32

Ahh, you are right. I updated my sphinx.php and it seems better now.

orban 26 Oct 2006 19:35

It's still bugged. Lemme figure this out. When I fix one the other breaks. Haha.

mute 26 Oct 2006 19:36

Hm, getting something similar targeting a specific forum with an "entire posts" search:

Warning: assert() [function.assert]: Assertion failed in /sphinxapi.php on line 284
Query failed: searchd error: index 'vbpost': sort-by attribute 'lastpost' not found.

Again, could be my currently messed up setup, it is hard to tell until I get the upgrade done tonight :)

Edit: haha didn't see that something else was broken. I'll leave the testing to you hehe

kmike 26 Oct 2006 19:39

Quote:

Originally Posted by orban
I thought "dateline" was the internal name for whatever field you have defined as the sql_date_column. But it looks like that isn't the case. I wonder why there even still is an sql_date_column. Because the new SetSortMode() can take ANY column. This confused me a bit.

Because AFAIK Sphinx sorts by dateline even when requested sort is by another column - e.g. for the relevance you'll get results basically as ORDER BY relevance DESC, dateline DESC.

Also, better replace "docinfo = inline" with "docinfo = extern" if you have RAM to spare, according to sphinx.conf.dist. Check the relevant section.

orban 26 Oct 2006 19:45

Okay, I had to add a new variable (that gets modified by whether you search in threads or posts) and now a much nice sorting code block.

http://dragy.de/public/sphinx/sphinx.php.txt

I tried all four combinations so it must work now!? Did I say this before? ;)

Hmm, about the docinfo, gonna have a look. We're only gifted with 2GB RAM for our forums and in a few months we'll have 2 million posts. Not good. I'll make a note in the installation.

mute 26 Oct 2006 19:48

The box I'm running sphinx on has 2gb of ram (it is just a slave db server). MySQL has been using a ton of memory lately because we're slaving searches to it, but after we switch to sphinx tonight I think it'll even out a lot. I'm thinking that extern might be better (we have 25 million posts). I'm not entirely sure, but I guess I can try both ways w/o much reconfiguration :)

orban 26 Oct 2006 19:50

Quote:

Originally Posted by kmike
Also, better replace "docinfo = inline" with "docinfo = extern" if you have RAM to spare, according to sphinx.conf.dist. Check the relevant section.

( 1 + number_of_attrs )*number_of_docs*4 bytes

( 1 + 5) * 1.5 million posts * 4 bytes = 34 megabytes.

Might be worth keeping external, you're right.

I'm running everything on one box, so I'm kinda really short on ram, but 34mb seems to be doable. :D

mute 26 Oct 2006 20:10

Hm, 600mb for 25 million posts, I guess I'll give it a shot :)

So, we just rolled out our sphinx search. All is well, but some users are reporting a ton of warnings after clicking submit and before they get their results.

I'm not exactly what is causing it, but the error is:

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 142

Looks as though $sphinx_conventry_id is not getting set on this line?

if (!can_moderate($docinfo['attrs'][$sphinx_can_moderate_forumid]) AND in_array($docinfo['attrs'][$sphinx_conventry_userid], $Coventry))

Any thoughts?

orban 27 Oct 2006 15:00

Second argument is $Coventry...? That variable is from vBulletin... :(

No idea what's causing this. Do you have Conventry disabled maybe?

And I strongly suggest turning off errors for users and log them to a file.

mute 27 Oct 2006 15:03

erm yeah, it's $Coventry. We don't use it, so that is probably why. For now i just set error_reporting to 0 and it went away :)

ALanJay 27 Oct 2006 15:04

I think it is slightly amusing that with this upgrade people are now seeing very similar errors to the ones that I have with vB 3.0.x

The assert function errors are caused by text numbers being passed rather than integers. You will see further up a solution using array_walk to make sure that that does not happen.

The coventry ID one I also had and if you don't use that function just remove the query line in sphinx.php short term until the real problem/solution is discovered :)

Good luck. :)

orban 27 Oct 2006 15:05

It's weird though it doesn't even get set...should at least be an empty array. And why is it even a seperate global and not $vbulletin->coventry?

Just remove the line if you aren't using coventry then.

Such a mess ;)

mute 27 Oct 2006 15:09

Alanjay, you mean you're surprised that jelsoft code is buggy and hard to work with? I'm not :)

orban 27 Oct 2006 15:12

Haha

:D

I really wish they removed all the global variables. They're the pain in the ass to be honest.

ALanJay 27 Oct 2006 16:21

as oban says :) ha ha

We should be very grateful that we can hack it to bits and get it to do stuff that makes it so expandable.

orban 27 Oct 2006 16:24

IMO they must add a hook before and after every statement. So you can completely modify any part.

weeno 30 Oct 2006 01:24

does this require a specific version of mysql? (I'm at 4.0.x)

I get an error on indexing

ERROR: sql_range_query: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT max_doc_id FROM sphinx_counter WHERE counter_id = 1 )'

thanks
arn

orban 30 Oct 2006 04:16

Have you created the sphinx_counter table?

weeno 30 Oct 2006 05:32

Quote:

Originally Posted by orban
Have you created the sphinx_counter table?

yeah... I think it has to do with 4.0.x not allowing nested SELECT queries. (subqueries).

that support didn't come until 4.1.

arn

orban 30 Oct 2006 15:45

Oh. Try asking in the sphinx forums maybe there's a way to avoid the nested queries.

amcd 02 Nov 2006 11:09

followed the howto and installed flawlessly in one shot

the first look is very very encouraging

will report again after few days

btw, my forum is Threads: 85,829, Posts: 3,686,297, Members: 175,810

orban 02 Nov 2006 16:05

Glad to hear :)

mute 02 Nov 2006 20:52

Our search is working flawlessly, and seeing ~4000 searches per day, which isn't too shabby at all.

Question for you guys, as I can't seem to find much in the way of documentation regarding the searches.

Does sphinx support "OR"? If you were to search for "test one two", it searches for all three with an implicit "AND". If you search for "blah not bleh", it will search for "blah -bleh". If you search for "test or task or mask", it will search for that literally (and likely ignore "or" as one of my stopwords)

Anyone? A couple of my more picky users are complaining, and I don't really have an answer for them, as I typically just do keyword searches.

amcd 02 Nov 2006 21:34

can we go one step deeper on this delta thingy?

main index - rebuilt once every day or maybe twice a week
delta 1 - rebuilt once every hour or maybe 4 times a day
delta 2 - rebuilt every 5 minutes

is it possible? will having the index in 3 parts affect performance?

mute 02 Nov 2006 22:48

Quote:

Originally Posted by amcd
can we go one step deeper on this delta thingy?

main index - rebuilt once every day or maybe twice a week
delta 1 - rebuilt once every hour or maybe 4 times a day
delta 2 - rebuilt every 5 minutes

is it possible? will having the index in 3 parts affect performance?

There really isn't a point in doing so. I run my delta updates every 5 minutes, and it takes ~1 second, and I get about 450 new posts per minute..

amcd 03 Nov 2006 06:43

Quote:

Originally Posted by mute
There really isn't a point in doing so. I run my delta updates every 5 minutes, and it takes ~1 second, and I get about 450 new posts per minute..

oh. then i suppose the current 2 level system is fine.

how often do you rebuild the main index? i have set it up for once a day.

ALanJay 03 Nov 2006 07:12

Quote:

Originally Posted by amcd
how often do you rebuild the main index? i have set it up for once a day.

I rebuild my once a day at the slowest part of the night. Which seems to be fine. The delta is run every 5 minutes and takes a few seconds to create. We have between 10-40,000 new posts a day.

mute 03 Nov 2006 15:28

Quote:

Originally Posted by amcd
oh. then i suppose the current 2 level system is fine.

how often do you rebuild the main index? i have set it up for once a day.

I personally have no plans on rebuilding my main index on a regular basis. Given the nature of the way this delta update stuff works, there is really no penalty to letting your delta updates grow in size, so I don't plan on rebuilding my index until sometime in the future that we have a maintenance window or something like that.


All times are GMT. The time now is 08:00.

Powered by vBulletin® Version 3.8.14
Copyright © 2022, MH Sub I, LLC dba vBulletin. All Rights Reserved. vBulletin® is a registered trademark of MH Sub I, LLC
Copyright ©2001 - , vbulletin.org. All rights reserved.