_htaccess
Well whilst I’m on a mission here I might as well divulge a few tricks I’ve learned that you can use to keep spam and harvesting down a bit on your site. Probably save you some bandwidth too. You are going to see some code in a moment which blocks a variety of “bots” gaining access to your site. The bots in this list do a number of different things which The Bomb Site wants absolutely nothing to do with and neither do you. This code is for your htaccess file. “What’s that?” I hear some of you mumble. Please stop mumbling. To be perfectly frank if you don’t know what it is then maybe you should not be reading this as it is a touchy subject at the best of times. Before I get to the code I’d better give some kind of explanation as to what htaccess is.
In simplest terms it is a text file which contains a variety of instructions to your server to tell it how you want to control your website. It is a very powerful tool and can wreak havoc in the wrong hands so be careful. Something else to think about is that htaccess has a cascading effect like CSS. An htaccess in your root directory will affect your whole site including any sub-directories. If it is in a sub-directory it will only affect that sub-directory plus any sub-sub-directories within it and so on down the line. Now I have an htaccess file in my root directory which contains some specific rewriting for TXP to give me those nice clean urls you see in the address bar but it contains other stuff which I want to be applied site-wide. I have other applications in sub-directories which have their own htaccess to do specific things that they require to be done but as I don’t countermand the instructions in the root htaccess it still applies and the instructions in the other htaccess files get added to it.
As I said, strictly speaking it is a text file but it doesn’t have a “.txt” extension or indeed any extension. You may also have noticed that it starts with an “_” underscore. This has the effect of making the file invisible in your web-tree which is cool. Don’t remove it. If you want to mess about with the file you are going to have to go into your FTP client and in the options you should find one to show hidden files or something similar. If your client doesn’t have this option remove it completely from your PC/Mac and get something else. I recommend FileZilla as does Podz. It’s free from Sourceforge. ‘Nuf said. I won’t put one here but there is a link and write-up in my website software pages.
Once you’ve done that you need to go and log-in to your site with your FTP client and see if there is already an htaccess file there. If there is great. Download it to your PC/Mac and open it up in your favourite code editor. Do not use notepad for this. When you save it with notepad it stands a good chance of knackering it up. Use a proper code editor please.
Another pointer for Windows users. Windows does not like files that have no extension. If the file already exists then there is no problem. You can modify and save it without hassle, however if you don’t already have an htaccess file it will not let you create one. Bit of a bummer that but it’s easy enough to get round. Simply save it as a text file ie “_htaccess.txt” (don’t forget the underscore) then upload it to your site, right-click on it and rename it by removing the txt bit. Once you’ve done that you shouldn’t need to do it again. You can just edit and save as normal.
So what can you use it for? Well how long is a piece of string? It can do a lot of stuff I haven’t even bothered looking at yet and probably never will but some of the more common uses are password-protecting your whole site or a sub-directory or a page, it can give you those nice clean urls I mentioned, it can block bots, spammers or anyone else you don’t want there, it can re-direct visitors to pages you may have moved and it can block hot-linking to mention but a few.
Can everyone use it? No is the short answer. It will depend on your host, what type of server you are on and how it is set up. You will have to find out for yourself if this is available to you. I’ve always been able to use it even when I was using Freeserve (Wanadoo) and Blueyonder (Telewest) free hosting – the type you normally get from your ISP so there is no excuse for your host saying you can’t use it unless it is because of the server type you are on. If they say no for no apparent reason move host and tell them why you’re moving. Bugger ‘em. If hosting providers and ISPs got off their arses and took a bit of responsibility for the security of the services they provide we wouldn’t be in as much of a mess as we are now.
So here’s the code:-
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yoursite.com.* - [F]
That top line turns the rewrite engine on. Was that obvious? If you already have an htaccess file and it already has that line in it remove this one. You do not want it in there twice. It also does not matter if it is at the top of your file or not but you must ensure that the list of rewrite instructions is beneath it else they won’t work. Got that? Good. You will need to modify line 23 where it says “yoursite.com” to your site details. No www please.
You can add to this if you know the names of any other bots but remember that it is only for bots. Nothing else. Just bots. Not spammers or referrers. They need different instructions which I shall get to in another post. So that’s just bots then. OK?
Now here are the safety regulations. If you are modifying an existing htaccess file keep a copy of the original somewhere. I SAID KEEP A COPY OF THE ORIGINAL SOMEWHERE! You might just find out why in a minute. Upload your modified or new htaccess to your site. Rename the file if you need to then go visit your site. Can you see it? Is everything still working? Great. Has your site just disappeared into the ether? Well this is why you should have a copy of the original. Just upload it so it over-writes the one causing the problem and you should be back to where you were. If you didn’t have an htaccess file in the first place just delete it. It would appear you are either on the wrong server-type or your host has blocked certain htaccess rules or maybe blocked them altogether. I’m afraid I can’t help you there. You need to talk to your host about it. If you are on the wrong server-type there’s little you can do other than move host. If the host is not allowing htaccess then ask them why and give them hell. Then move.








Comments ( 8 )
The problem is that I tried the “Rewrite” thingie before and it didn’t really work as I wanted it to. So that’s where I cam from.