You may remember my previous blog, which covered the subject of customizing your web site based on the location of the person visiting it. Itching for some more tech info for your website? Don’t worry; I’ve got you covered.
Let’s talk Spambots. The internet is full of these automated scripts that crawl around the web for any number of reasons. Some, like the crawlers from Google and Bing, are there to read content, store relevant data and analyze it for search results. Those are the friendly bots that you want to come visit.
However, there are a lot of other crawlers that you really don’t want on your site. These evil bots may come to find email addresses to put on their spam lists, steal content or look for weaknesses in your code that allows them to cause damage.
The “good” crawlers can be taught. You can instruct these bots how to correctly crawl a site, telling it where it can go and where it can’t–and they actually listen! These instructions can be given in a file named “robots.txt”, or via meta tags at page level. Need details about instructing bots? Check out http://www.robotstxt.org/.
The bad crawlers, however, do not listen to instructions, and controlling them requires a much more heavy-handed approach.
Think that you may be bugged down by some evil bots? The first thing you need to do is identify these troublesome crawlers. This can be done by finding the name of the crawler, as all things crawling around out there have something called a “User Agent”. This is the name or identity of the crawler, for example, Google’s is called “GoogleBot”. Most web analytic programs have a way to aggregate a list of the user agents that are visiting your web site. When you acquire this list, you can search the web for info on each user agent or visit a resource such as http://www.user-agents.org/ to learn about them.
Here are some I have used with varying level of effectiveness:
Perishable Press Blacklist:
A great blacklisting script that helps filter out not only known user agents, but also IP addresses and known malicious requests intended to find weaknesses in your server.
Stop Forum Spam:
A community driven project that holds a database of known spammers on web sites that allow user generated content.
A system that helps you control user-generated content providers and contact forums; it analyzes the content and accepts or rejects it based on signals that would indicate spam.
I hope my experience with these bots, both good and evil, will help you determine how to best handle your own crawlers!
What other sites have helped you manage spambots?
Snapchat launches Advanced Mode for its self-serve Ads Manager ow.ly/WM8y30ehxMr
If you’re stuck with a follower count with little to no reach to monetize, what’s the next logical step? ow.ly/zKAB30eeAN5
7 Steps to Start Your First Content-Marketing Campaign ow.ly/99I830e4ZnK
Red Wing Launches New Boot Wall of Honor - Shoes of the Men and Women Who Keep America Working ow.ly/Lsg730dTKBs
Books Every Franchise Buyer Must Read ow.ly/vE4M30dOj9b