vBulletin Mods

The Official vBulletin Modifications Site
https://www.vbulletin.org/forum/showthread.php?t=127868

amcd 12 Jan 2010 17:33

16.5 mil posts
488k threads
vb 3.6

no plans to move to vb 4 until everyone else does it, too.

boolean and phrase search are needed. been missing them.

spending for a search solution - no problem. spending 2k - no way.

Quote:

I would really love an updated version that runs on 3.6 that takes advantage of all the Sphinx goodies, since I don't see us moving to 4.0 for close to a year. All of the custom code we've written has to be ported and tested, and being the lone admin on a site this big has my hands full a lot of the time.
echo

eoc_Jason 13 Jan 2010 21:14

Okay, I've been working slowly but surely... Here's the following constraints thus far:

1. New threads/posts added when you run your delta cron job (most run every 2-5 min)...
2. Changes in # views, last poster, deleted threads / posts, etc should be real time updates.
3. Edits to the title or post text will not be updated until next full re-index (usually nightly) unless it is within the delta file.

Will have boolean searching, phrase, etc...

mute 13 Jan 2010 22:26

Quote:

Originally Posted by eoc_Jason (Post 1954823)
Okay, I've been working slowly but surely... Here's the following constraints thus far:

1. New threads/posts added when you run your delta cron job (most run every 2-5 min)...
2. Changes in # views, last poster, deleted threads / posts, etc should be real time updates.
3. Edits to the title or post text will not be updated until next full re-index (usually nightly) unless it is within the delta file.

Will have boolean searching, phrase, etc...

One thing I'd like to have that we don't currently have, is properly ordered search results. If you don't do full reindexing on a regular basis, they tend to get really out of order.

kmike 15 Jan 2010 11:47

Apart from using Sphinx to search for the similar threads, you can also use it to generate the post excerpts with search keywords highlighted when in the "Show search results as posts" mode.

Our stats: almost 14 mln posts, 1.1 mln threads, 300k users, vB 3.8.
We're using our own Sphinx implementation since it predates the hack in this thread.

We got rid of the obscure search and sort modes though (such as sorting by the number of views or replies), and there was not a single complaint from our members. I don't think you should focus too much on 100% compliance with the default search. Having too many document attributes will inflate the index size, resulting in more I/O and more sluggish performance.
If you are worried about the need to edit the default search form template, you could always clone it, make the necessary changes and ship it with the product.

eoc_Jason 15 Jan 2010 19:26

Thanks for the feedback guys. Another thing I'm pondering on is instead of trying to work off just a main + delta index is to break the total post count up and constantly rotate smaller indexes...

I.E. If a site has 10,000,000 posts... Have 10 indexes each with 1,000,000 threads. Then have each of the indexes rotate say hourly. This would be a shift from the typical one massive re-index nightly (or however often you do it). In theory too, the last index would contain the most recent posts and could be re-indexed more often.

I dunno, that's just a thought... My concern right now is the core code for searching, the indexes themselves can be manipulated differently at a later time as that is transparent to everything else.

kmike 16 Jan 2010 06:20

Quote:

Originally Posted by eoc_Jason (Post 1956757)
Thanks for the feedback guys. Another thing I'm pondering on is instead of trying to work off just a main + delta index is to break the total post count up and constantly rotate smaller indexes...

That's what we're doing, too, though the delta is still there. The bonus is that you can set up a distributed index with the number of agents equal to the number of CPUs, like described here, to take advantage of all CPUs in the server. However it's more of a manual operation, it would be hard to generate a partitioned sphinx.conf automatically.

eoc_Jason 17 Jan 2010 22:17

kmike - thanks for that info, I must over looked over that in the docs...

Just curious, how much of a performance difference did you see using the distributed process?


I kind of got sidetracked today... One of my good friend's wife just got out of the hospital, so I was there for a while today. Then I was coding some anti-spammer measures for my forum registration process...

mute 17 Jan 2010 22:49

We have 2 post indexes, one or our live post table, and one for our archived post table. They each have 30 million posts each. I don't see a point in sharding the post indexes aside from being able to take advantage of multiple CPUs when indexing.

The way I see it, if I can keep the old indexes online while I do a full reindex, I don't really care how long the full reindex takes since (at least in our case), the search server is just a slave database server and not our primary.

Kevlar 18 Jan 2010 12:20

The only thing I am waiting on before converting to vB4 is sphinx (or a working search alternative). The rest of the little stuff I modded I can do with or without until those developers get upgrades.

1.3 million threads
18 million posts

kris 18 Jan 2010 13:13

mute, can you share how did you archive post table ? What changes did you do in code and MySQL ? I want to move my old posts to another post_archive table but I am not sure how can I join those tables from vbulletin code.

eoc_Jason
my forum is 200k threads and 10mil posts, vb 3.8.4. I have only one database (no slave), nginx webserver, Core I7 with 12GB RAM.

I installed sphinx on server and from ssh it works great but from moded search.php it works very strange, sometimes when I want to find some keywords with option "show results as posts" it returns "no results" message but if I change search options to "show results as thread" with same keywords, I got good numbers of results showen as threads.

Users posts search does not works at all, search.php?do=finduser&u=xxx always gives blank screen no php errors in log or anywhere just blank screen and thats it.

amcd 18 Jan 2010 19:21

Quote:

my forum is 200k threads and 10mil posts, vb 3.8.4. I have only one database (no slave), nginx webserver, Core I7 with 12GB RAM
It is time for you to move to dual servers - one for webserver/PHP and another for MySQL.

kris 18 Jan 2010 22:06

Quote:

It is time for you to move to dual servers - one for webserver/PHP and another for MySQL.
no money making here :) just spend

I think spliting big post table to smaller read only archived tables will be cheaper and even better solution and of couse Sphinx for search.

mute 19 Jan 2010 02:01

Quote:

Originally Posted by kris (Post 1959119)
mute, can you share how did you archive post table ? What changes did you do in code and MySQL ? I want to move my old posts to another post_archive table but I am not sure how can I join those tables from vbulletin code.

It's REALLY nasty. I really don't think you want to do it. In fact, I'm thinking about abandoning it on our site.

Back when we wrote it, we were probably at like 25 million posts, on (if I remember right), like a dual xeon with HT. Now, we're on a Quad Quad xeon box with 16gb of ram. We have 30 million in our archived tables (10x3 mill posts each) + 30 million in our post table. I'm not seeing any slowdowns against the post table, which has me wondering if we'd be seeing any slowdowns if I was pulling against all 60 million in one table, given how much faster our CPUs have gotten and how much ram we have sitting around.

masons 19 Jan 2010 03:41

Hi,

I have sphinx installed for my wiki since a few days, and now am looking to get it working with my vbulletin setup,

But, my server load went a bit overboard this morning (120+) and I have no idea how to work with that,..... any tips on taking some presure of the load? Before I add this to vbull?

Some server stats (dedicated)
Processor #1 Vendor: GenuineIntel
Processor #1 Name: Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz
Processor #1 speed: 1998.000 MHz
Processor #1 cache size: 6144 KB

Processor #2 Vendor: GenuineIntel
Processor #2 Name: Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz
Processor #2 speed: 1998.000 MHz
Processor #2 cache size: 6144 KB

amcd 19 Jan 2010 11:25

go through this thread over at vb.com


All times are GMT. The time now is 20:48.

Powered by vBulletin® Version 3.8.14
Copyright © 2021, MH Sub I, LLC dba vBulletin. All Rights Reserved. vBulletin® is a registered trademark of MH Sub I, LLC
Copyright ©2001 - , vbulletin.org. All rights reserved.