Blog Landing Hero Image

Bluewater Blog

01/15/2013:  Protecting Your Web Site from Evil Spambots


You may remember my previous blog, which covered the subject of customizing your web site based on the location of the person visiting it. Itching for some more tech info for your website? Don’t worry; I’ve got you covered.

Let’s talk Spambots. The internet is full of these automated scripts that crawl around the web for any number of reasons. Some, like the crawlers from Google and Bing, are there to read content, store relevant data and analyze it for search results. Those are the friendly bots that you want to come visit.

However, there are a lot of other crawlers that you really don’t want on your site. These evil bots may come to find email addresses to put on their spam lists, steal content or look for weaknesses in your code that allows them to cause damage.

The “good” crawlers can be taught. You can instruct these bots how to correctly crawl a site, telling it where it can go and where it can’t–and they actually listen! These instructions can be given in a file named “robots.txt”, or via meta tags at page level. Need details about instructing bots? Check out

The bad crawlers, however, do not listen to instructions, and controlling them requires a much more heavy-handed approach.

Think that you may be bugged down by some evil bots? The first thing you need to do is identify these troublesome crawlers. This can be done by finding the name of the crawler, as all things crawling around out there have something called a “User Agent”. This is the name or identity of the crawler, for example, Google’s is called “GoogleBot”. Most web analytic programs have a way to aggregate a list of the user agents that are visiting your web site. When you acquire this list, you can search the web for info on each user agent or visit a resource such as to learn about them.

Here are some I have used with varying level of effectiveness: 

Perishable Press Blacklist:
A great blacklisting script that helps filter out not only known user agents, but also IP addresses and known malicious requests intended to find weaknesses in your server.

Stop Forum Spam:
A community driven project that holds a database of known spammers on web sites that allow user generated content.

A system that helps you control user-generated content providers and contact forums; it analyzes the content and accepts or rejects it based on signals that would indicate spam.

I hope my experience with these bots, both good and evil, will help you determine how to best handle your own crawlers! 

What other sites have helped you manage spambots? 


About MaKenzie Wangsness
MaKenzie Wangsness

MaKenzie is Bluewater's Digital Marketing Manager. Spending several years in sales and marketing roles at a local franchise level, MaKenzie has a unique appreciation for bringing big brand power to the local level. In her spare time, MaKenzie can be found near good food (in Minneapolis restaurants or her own kitchen!) or spending time with her awesome husband and baby daughter Jovi.

Comments are closed.

Follow Us on Twitter