vBulletin Mods

The Official vBulletin Modifications Site
https://www.vbulletin.org/forum/showthread.php?t=127868

orban 16 Oct 2006 08:20

What's on line 209?

ALanJay 16 Oct 2006 10:17

1 Attachment(s)
Hi,

Well further to the above doing various test searches which all seem to produce the correct results I have discovered a couple more of these anomalies :)

when I set various search options - ie user or forums or date as well as text search I get these errors in sphinxapi.php there are various assertion test ie

line 209:

assert ( is_int($limit) );

line 234

/// set groups
function SetGroups ( $groups )
{
assert ( is_array($groups) );
foreach ( $groups as $group )
assert ( is_int($group) );

$this->_groups = $groups;
}

It looks like the defaults set in sphinx.php line 75

$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( $sphinx_weights );
$cl->SetLimits ( 0, $vboptions['maxresults'] );
$cl->SetMatchMode ( SPH_MATCH_ALL );
$cl->SetGroups ( $sphinx_groups );
$cl->SetGroups2 ( $sphinx_groups2 );
$cl->SetGroups3 ( $sphinx_groups3 );
$cl->SetGroups4 ( $sphinx_groups4 );
$cl->SetGroups5 ( $sphinx_groups5 );
$cl->SetSortMode ( $sphinx_sort );

And before this some times for some searches they are set to strings line 52

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

This doesn't seem to effect the results but the assertion fails when the elements are not integers.

In the case of line 209 (sphinxapi.php) and line 75 (sphinx.php) these can be made to be (forced) to integers as they are obviously numbers ie

$cl->SetLimits ( intval(0), intval($vboptions['maxresults']) );

But I am not certain about the other elements and options which because the defaults are text strings don't work in the same way.

Anyway hope that helps.

By the way if anyone wants the recipe for using Sphinx with 3.0.X then let me know and I can remove my specific defaults and post it here. The biggest change is the recoding from OOP to the old style referencing of variables. But there always seem to be ones that meet the same requirements.

The only other things to spot are the changes to the search.php from the Vb code which follows the examples that Oban gave but obviously in slightly different locations in search.php ie

Make change c1 at around line 304
Make change c2 at around line 331
Make change c3 at around line 1210
Make change c4 at around line 1414
Make change c5 at around line 1147

sphinx.php see the diff file attached.

Once again very cool work Oban and we should also thank the Andrew Aksyonoff over at www.sphinxsearch.com

ALanJay 16 Oct 2006 13:57

Hi,

Having done more research the warning errors can be switched off by adding:

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

At the top of the sphinxapi.php script. It might be better to stop the the reasons the warnings are being created but at least it gives on the option to see or not to see them.

Another curiosity is on our forum the searches all seem lighting quick EXCEPT when you look for exclusively the "thread started by user" this can take over a minute to give back a result.

If you add additional requests - limit the date / thread content / forums to search the time it takes is reduced.

Finally when searches are processed every one of them you see the redirect page to show that it is being processed when you do a "thread started by user" search you don't see that label so not sure if my changes to search.php have caused this anyone any ideas or does the same happen on 3.6.x?

Regards
ALan

mute 16 Oct 2006 23:28

orban, did you ever fix the searching of post titles? I thought I was running your latest code, but it seems to be broken on my devel install.

orban 16 Oct 2006 23:32

It also searches them in my latest version (you can set relevancy in the sphinx.php) but it doesn't quite work like the default vB search (yet).

ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.

ALanJay 17 Oct 2006 09:40

Quote:

Originally Posted by orban
ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.

OK - does anyone else get "assert" warnings?

What I have done is set the warning messages off

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

in sphinxapi.php

As far as I can see the assert errors are generated because the asserts all check to see if things are integers and some of the input defaults are either text strings numerics or text strings.

These warnings don't seem to effect the output which seems to work pretty well. But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))

So this warning implies that one of the items is the wrong datatype - checking back through the code on line 34 and 50 these are set to:

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

Is this the issue? Should they be numeric?

For anyone interested this is now live at:

www.digitalspy.co.uk/forums/

We have 11,158,584 Posts and 464,239 Threads. And the main data file is a little over 4Gb in size.

It is still a work in progress but it does seem to produce the correct results :)

Quote:

Originally Posted by ALanJay
But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))

After much thought we realised that we don't use the $Coventry feature and I suspect that is the reason it does not work. As I'm not sure what $Conventry should resolve to I have removed from my implementation the whole line. It seems to say if not moderator and sent to Coventry then don't do search and as we have no people in the secodn category removing it seems to be the best short term solution.

I'm not sure if this is an issue between 3.0.x and 3.5/3.6 but thought I would share my thoughts on this as it kept me on my toes and I now have a much better understanding of the way the code works :)


PS the docinfo[$spinx????] elemets turn the group defaults into numerical output as required. I'm still not sure why the assert errors are being seen though will delve deeper :)

PPS Well after more searching and playing I am no further forward as to why the assert warning errors are occuring. Trying to force the elements to be integers with intval breaks the code :) so I am now with a system that seems to work but generates warning errors that I have switched off. I assume no one using 3.6 is having these issues with these assert warnings?

orban 17 Oct 2006 11:58

Why does intval() break any code?

And maybe the $Coventry variable is something else in vB 3.0...

I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0 :(

ALanJay 17 Oct 2006 13:07

Quote:

Originally Posted by orban
Why does intval() break any code?

That is a very good question. I suspect I am not using it 100% correctly but in the simplest example line 32

$sphinx_groups2 = $sphinx_userids;

to

$sphinx_groups2 = intval($sphinx_userids);

Seemed to cause odd behaviour.

I was also seeing if using it in:

if (!empty($userids)) $sphinx_userids = explode(',', $userids);
else $sphinx_userids = array();
if ($forumchoice != '') $sphinx_groups = explode(',', $forumchoice);

But wasn't sure I could use it in this context.

My problem is that not entirely understanding the logic of what is going on here (but learning as I go along). I'm not sure why I am seeing the "Warnings" yet they generate perfect results.

Depending on the results each of the elements "SetGroups" "SetGroups2" SetGroups3" generate these warning errors but because these are arrays I need to build the array with integers and I assume not numerics that are text(?)

Quote:

Originally Posted by orban
And maybe the $Coventry variable is something else in vB 3.0...

It is possible - from talking to my system admin it allows you to not allow users to do certain things. After thinking about this I don't think it is an issue as we don't use it. So for me removing it solves the problem that the second element of the if statement that checking if the user has been sent to Coventry isn't nescessarry.

Quote:

Originally Posted by orban
I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0 :(

No problem without your code we wouldn't have been able to do this at all. So thanks so much.

I assume you don't see any of the assertion errors in vB 3.6 ?

Anyway as you can see (if you register on our site) the Sphinx search does work and very smoothly and quickly and great solution to off looading the search function out of the main database.

One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

Swamper 21 Oct 2006 10:51

Quote:

Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

Why not have that specific search just redirect to the standard vB search.php? It's fast.

----

Found my way here via the Big Boards Thread on vB.com - wow - I'm going to get on this right away! :D We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks and for over a year now we've survived only because our search was split up into separate tables according to date range - updated nightly - and stored on another drive, but with 'Search this Thread', 'View New Posts' and 'Find all posts by User' acting on the live post table.

kmike 23 Oct 2006 07:48

Quote:

Originally Posted by Swamper
We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks

Be warned that vB 3.6 is much more CPU demanding than vB2 (and even vB3), so you'd better beef up your web frontend(s) before the final switch.

Quote:

Originally Posted by orban
Let's assume you have

thread1 - 100 times "word"
thread2 - 50 times "word"
thread3 - 10 times "word"
thread4-50 5 times "word"

A search for "word" will return us 2500 posts. BUT there are only 50 different threads.

If your limit is 1000 (like mine) this will only return like 30 threads. So you're missing out 20......I'm actually seeing this on very common words (when searching post and "show as threads").

Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.

orban 23 Oct 2006 10:49

Quote:

Originally Posted by kmike
Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.

Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.

ALanJay 23 Oct 2006 16:40

Quote:

Originally Posted by orban
It's because they are both arrays, or a string of comma seperated numbers?

You'd have to use array_walk, lemme know if you need help.

I everntually worked this out but never managed to get it to work sucessfully I assume something in difference between the way 3.0.x and 3.6 handles these casues a problem. Because it is only a warning I have left it - maybe next time there is an opportunity to play I will have another go with array_walk if I can fathom the syntax to get everything switched from numerals as text to integers.

Quote:

Originally Posted by orban
With or without key words?

If it's without it's using the default search and I can't really help with that.

Without which I now understand why it is slow and we have removed it from our choiced 1 minute to bring back the answer was a little long.


Overall it has been running now for a week and once we sorted a few things out it has been excellent and using your cool current and DELTA index the databases are updated every 15 minutes and the whole site reindexed every night.

Thanks for the ideas this has been an excellent tool and remarkably easy to implement.

orban 23 Oct 2006 16:56

function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

Glad to hear it works for you!

ALanJay 23 Oct 2006 17:08

Quote:

Originally Posted by orban
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

I will have a play - thanks.

Quote:

Originally Posted by orban
Glad to hear it works for you!

Seems to :) Of the various hacks and atempts to solve the text search issue this one seems to have delivered on its goals. There are still a few things I don't understand and which would probably improve performance but overall it works well.

Maybe you have some ideas on the issues:

morphology = none
stopwords =
min_word_len = 3
charset_type = sbcs
}

What do morphology and stopwords do / offer and how to best use them.

and

mem_limit = 256M
}

mem_limit for creating the index anyone have any views as to sensible optimum answer for this we are running this on a machine with 8Gb of RAM and as it started as 32M I didn't want to make it too big but it still complains it could be better :)


==============

Looking in the original configuration file I think I have a handle on the morphology, word_len and char set.

Would I be right in saying that the stopwords file is a list of words NOT to index?

If so does anyone have a good list of 2 and 3 letter words that can happily be removed from an index :)

==============

Looking on the sphinxsearch forums there is discussion on creating stop words and the indexer can produce list of most used words for you to work with ie

/usr/local/bin/indexer --config sphinx.conf --rotate --buildstops sphinx-stop.txt 1000 --buildfreqs

This builds a file with the most commonly used words in the index and the frequencythat they are in your index.

If I understand this correctly it should allow you to remove a few of the obvious things.

Quote:

Originally Posted by orban
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

Hi,

Looking at the code in sphinx.php:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Where do you put the array_walk manipulation?

As far as I can tell one needs the results of the various items above to be so processed.

or do you implement it:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

ie

$cl->SetGroups4 ( (array_walk( $sphinx_groups4, "intvalArray") );

I assume it doesn't matter that sometimes the array will be one element long.

=====================================

Quote:

Originally Posted by orban

Quote:

Originally Posted by alanjay
Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?



With or without key words?

If it's without it's using the default search and I can't really help with that.


Quote:

Originally Posted by Swamper
Why not have that specific search just redirect to the standard vB search.php? It's fast.


Searches without keywords already are redirected to the default search.

Just curious "orban" having done some more checks when doing just a user "Find Threads Started by user" it is over a minute with the size of files we have - and from what you are saying this is the standard vB result. While once you add an addional key - search string it all works much faster as it is using Sphinx (is that right?).

Is there a reason you didn't code that using Sphinx?

kmike 24 Oct 2006 11:16

Quote:

Originally Posted by orban
Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.

Well, it's their own fault then ;-)


All times are GMT. The time now is 18:47.

Powered by vBulletin® Version 3.8.14
Copyright © 2022, MH Sub I, LLC dba vBulletin. All Rights Reserved. vBulletin® is a registered trademark of MH Sub I, LLC
Copyright ©2001 - , vbulletin.org. All rights reserved.