Register Members List Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
  #1  
Old 20 Aug 2015, 16:44
spamgirl spamgirl is offline
 
Join Date: Jan 2007
Is there a way to restrict how often guests can refresh?

I was wondering if there is any add-on that can limit how frequently guests are allowed to refresh? I'd like members to be able to refresh as much as they want, but guests to be limited, regardless of the server load. Thanks for your help!
Reply With Quote
  #2  
Old 20 Aug 2015, 17:29
Elite_360_'s Avatar
Elite_360_ Elite_360_ is offline
 
Join Date: Nov 2012
Their is no way to stop someone from refreshing their browser.
Reply With Quote
  #3  
Old 20 Aug 2015, 17:34
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
Leverage browser cache of static content, this way the browser doesn't load the entire KB on refresh. In fact it will load only elements it didn't already encounter on first load.

Example, if your site loads 400kb, on refresh it should only be 1 or 2 percent of that. Because the rest is cached.
Reply With Quote
  #4  
Old 20 Aug 2015, 17:40
spamgirl spamgirl is offline
 
Join Date: Jan 2007
It's actually to ensure people aren't using scripts to scrape our site. We don't want to turn off public access, but we do want people to stop taking content from our site and reposting it elsewhere. Having to track down their host information and file a copyright complaint is getting to be a real time suck.

We'd just like an error to be shown if they refresh more than once every minute, which I know is possible when server load is high (if it's above x then certain membergroups see an error message, while other member groups do not). I'd even be happy if it only updated the page content every minute.
Reply With Quote
  #5  
Old 21 Aug 2015, 02:31
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
You need "Ban Spiders by User Agent" then, a good comprehensive list of bad bots is available and contains most of the known content scrapers, and you can add any you see to the list as well.
Reply With Quote
  #6  
Old 21 Aug 2015, 02:48
spamgirl spamgirl is offline
 
Join Date: Jan 2007
Originally Posted by Max Taxable View Post
You need "Ban Spiders by User Agent" then, a good comprehensive list of bad bots is available and contains most of the known content scrapers, and you can add any you see to the list as well.
The problem is that it's a single person scraping our site for their own, and I don't know their IP, otherwise I'd just ban them.
Reply With Quote
  #7  
Old 21 Aug 2015, 17:48
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
You get the IP and their user agent string while they are on your site, from the WoL or even the server logs.

But let me get this straight - you want to restrict the reload of all visitors, because you have one person manually scraping content?
Reply With Quote
  #8  
Old 21 Aug 2015, 18:01
spamgirl spamgirl is offline
 
Join Date: Jan 2007
Originally Posted by Max Taxable View Post
You get the IP and their user agent string while they are on your site, from the WoL or even the server logs.

But let me get this straight - you want to restrict the reload of all visitors, because you have one person manually scraping content?
We have hundreds of guests on the site, I have no way to determine which the scraper is.

I just want to temporarily slow them until we can figure out what's going on. If you have a better idea, I'd be happy to take your advice.
Reply With Quote
  #9  
Old 22 Aug 2015, 03:19
Max Taxable's Avatar
Max Taxable Max Taxable is offline
 
Join Date: Feb 2011
Originally Posted by spamgirl View Post
We have hundreds of guests on the site, I have no way to determine which the scraper is.

I just want to temporarily slow them until we can figure out what's going on. If you have a better idea, I'd be happy to take your advice.
I would solve this problem by installing Paul M's "Track Guest Visits" and studying the log it provides daily, looking for IP addresses that load a lot of pages. That mod tracks visitors that way. It also gives you their user agent and tells exactly what pages they visited as well, and it's all timestamped even.

You must identify the bad actor and stop IT, not penalize all visitors. You want to slow down your page loading or otherwise restrict visitors, get ready for the hit from google in your search results and pagerank.
Reply With Quote
  #10  
Old 22 Aug 2015, 13:50
spamgirl spamgirl is offline
 
Join Date: Jan 2007
Originally Posted by Max Taxable View Post
I would solve this problem by installing Paul M's "Track Guest Visits" and studying the log it provides daily, looking for IP addresses that load a lot of pages. That mod tracks visitors that way. It also gives you their user agent and tells exactly what pages they visited as well, and it's all timestamped even.

You must identify the bad actor and stop IT, not penalize all visitors. You want to slow down your page loading or otherwise restrict visitors, get ready for the hit from google in your search results and pagerank.
Yeah, you're right. I'll give that a try, thank you!
Reply With Quote
  #11  
Old 22 Aug 2015, 18:58
Zachery's Avatar
Zachery Zachery is offline
 
Join Date: Jul 2002
Real name: Zachery Woods
If you want to stop people from scraping your site, don't put it on the internet.
__________________
Looking for ImpEx?
Reply With Quote
  #12  
Old 22 Aug 2015, 22:14
TheLastSuperman's Avatar
TheLastSuperman TheLastSuperman is offline
 
Join Date: Sep 2008
Real name: Michael Miller Jr
Question

Originally Posted by Zachery View Post
If you want to stop people from scraping your site, don't put it on the internet.
I know you weren't sitting there all riled up, intentionally posting something to sound mean or rude yet I thought back to an old saying from when we were kids, most of us were taught this; "If you don't have anything nice to say, don't say anything at all" - That's not you in my opinion. Since tone is always missing I can't assume but do you ever re-read what you type and realize its not offering one bit of help sometimes? I think the OP has a valid concern and wants helpful suggestions not a reply that can't be taken any other way but being a smarty-pants.

Spamgirl,

I think Max had an excellent idea... it may take more time to review the logs for certain guests with Paul's mod but if you do it now and find who you think the culprit is, it might help! Remember though that overseas a person can unplug their modem/router and BAM instant new IP address so if they happen to be where that can happen, lets hope they only scrape content and aren't toooooo web savvy .
__________________
Daddy Does Dios and Figs!
https://www.linkedin.com/in/thelastsuperman

Search - Use the search feature to find similar issues/answers.
Information - Include screenshots, copy/pasted error codes, url etc.
Fixed - Please return to your thread/post and let us know how it was fixed!
Thanks - For participating! Click the "Like" on a post if someone helped you!
Reply With Quote
  #13  
Old 22 Aug 2015, 22:28
spamgirl spamgirl is offline
 
Join Date: Jan 2007
Originally Posted by TheLastSuperman View Post
Spamgirl,

I think Max had an excellent idea... it may take more time to review the logs for certain guests with Paul's mod but if you do it now and find who you think the culprit is, it might help! Remember though that overseas a person can unplug their modem/router and BAM instant new IP address so if they happen to be where that can happen, lets hope they only scrape content and aren't toooooo web savvy .
FWIW, I get what Zachary is saying, but that doesn't mean I won't try to at least stem the flow. If we sit back and don't fight, we let the monsters win, and I refuse to do that in any situation. Nothing is hopeless.

Anyhoo, I agree that Max had an excellent idea! Already three IPs are sticking out like a sore thumb, and one of them seems to be the culprit (with a scraper I didn't even know about potentially being a second problem user). Based on their shitty web design skills, I'm hopeful that means they aren't tech savvy at all. Thank you all so much for your advice!

--------------- Added 23 Aug 2015 at 15:31 ---------------

I've found the IPs and tried to block them with .htaccess. I included my own IP in order to test it, but I am still able to access the forum, I just can't see the CSS or images. Here is what I did:

order allow,deny
deny from ###.#.#.
deny from ###.#.#.
deny from ###.#.#.
allow from all

Does anyone know why it would be so wonky? I put it in the main folder of my forum (html1). My site is hosted on EC2, if that matters. I tried it last week and it worked, so I don't know why it wouldn't now...
Reply With Quote
  #14  
Old 25 Aug 2015, 01:03
Zachery's Avatar
Zachery Zachery is offline
 
Join Date: Jul 2002
Real name: Zachery Woods
Sometimes the truth hurts, but its important to understand the limitations of what you can do. You can ban an ip, but it will probably change and come back.

You can make it so only registered users can view content, but then your search rankings go down.

You can make some content pay only, but chances are if its stuff people want someone will steal it, and hopefully they don't do it with a stolen credit card.

I do think you should fight, just be ready for the long haul.

If they're actually stealing and rehosting your content on their site, you could try a DMCA, but it may or may not work.
__________________
Looking for ImpEx?
Reply With Quote
  #15  
Old 25 Aug 2015, 01:22
spamgirl spamgirl is offline
 
Join Date: Jan 2007
Originally Posted by Zachery View Post
Sometimes the truth hurts, but its important to understand the limitations of what you can do. You can ban an ip, but it will probably change and come back.

You can make it so only registered users can view content, but then your search rankings go down.

You can make some content pay only, but chances are if its stuff people want someone will steal it, and hopefully they don't do it with a stolen credit card.

I do think you should fight, just be ready for the long haul.

If they're actually stealing and rehosting your content on their site, you could try a DMCA, but it may or may not work.
I've been doing the DMCA, but they just change hosts every day. Now I'm blocking by IP, and just redoing it constantly. I've actually found *multiple* scrapers since installing the Track Guests extension, go figure. :/ I'll just keep up the good fight and hope I annoy them into scraping someone else lol
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


New To Site? Need Help?

All times are GMT. The time now is 19:59.

Layout Options | Width: Wide Color: