The Basics
You do not need an entire regex (regular expression) course to edit an htaccess file. All you need is a core understanding of the basics and competency to search the web for your needs and replace variables as needed.
Do You Actually Need To Edit htaccess?
Most WordPress sites do not need .htaccess edits. After version 3.0, the performance issues using plugin-based redirects were fixed in WordPress core. It’s best to use a plugin like Redirection. Adding or removing www is simply done in the settings of WordPress. No hard edits required.
The web is evolving with new technologies like NGINX. If you’re hosting a WordPress site on NGINX, htaccess doesn’t exist. Honestly you probably don’t need it.
But not everyone is on WordPress or NGINX. For these webmasters, this is exactly what you need. (Unless you host with GoDaddy, in that case move to a new host. It’s an uphill battle to get anything fixed on GoDaddy servers.)
Rules
There are 3 rules to editing .htaccess:
1. Always make a backup (duplicate the file before you even open it).
2. Use an editor that doesn’t use foreign characters or better yet, protects you from it. We like Transmit. Be sure to view hidden files.
3. Never copy-paste. Even LinkResearchTools’ Juice Tool that clearly says “copy paste this to your .htaccess file” – it has bad characters every time that can corrupt your file.
Common Code
With htaccess, everything is chronologically applied, adding rules for the server line by line.
In rare cases it becomes more complex with layers. Remember if you have a .htaccess file in the root / folder, the rules in root will apply to every subfolder like /public_html. If /public_html has a .htaccess file also, you are much more likely to create redirect loops or crash your server.
Commenting
# hashtags
This comments out code so you can later remember why you added the next piece of code to the file months down the road. It’s a best practice.
This is also useful to disable existing code instead of destructively deleting it.
Disable Directory Browsing
Options -Indexes
This should be a standard on just about any website you work on. Unfortunately it is rarely included or ever checked for, even in WordPress installations which is quite puzzling considering they use a blank index file to protect certain file paths. Remember, directories are indexable by Google and create security issues on top of SEO issues.
Plain Redirect
Redirect 301 /something-old.html http://yoursite.com/somethingnew/
This is the most common, basic, redirect and probably the only one you need if your server has the basic force www or non-www applied.
Redirect is obviously just a simple server command, or “directive” as Apache calls it. The 301 is just a status code. You can read the full list of status codes here. As SEOs the only status codes we will likely ever use are 301, 302 in rare cases, 307 for some odd cloaking. The only ones we should otherwise ever use as a tool in .htaccess are 401 to deny access, and 404 for some odd noindex solutions.
Plain Allow
Allow 1.2.3.4
If you have an outsourcer who keeps getting denied to your site because they live in a country that is known for bot traffic and hacking, add this line with their IP address to let them in.
Plain Deny
Order Allow,Deny
Allow from All
Deny from 1.2.3.4
Notice strange activity from an odd IP address? Use this command. The Allow,Deny is a “directive” that looks a bit strange. Just remember to allow everyone so you don’t block everyone out on accident.
Custom Error Pages
ErrorDocument 404 /your404page.html
The nasty “404 not found” can be customized easily. Simply add this line. It can be used for any error page like 401, 403 or 500. WordPress users should use a plugin instead of touching htaccess.
Symbolic Links
Options +FollowSymLinks
Normally, this does nothing and you’ll question why it exists. It is required for Mod_Rewrite unless you have your server configuration file set to allow it. It doesn’t hurt anything to have both, so you can add it without issue.
Forcing WWW
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
There are a few things happening here, but overall all the code is basic. Here is the translation:
Long form:
RewriteEngine On = Enable this set of rules.
RewriteCond = If it’s an http:// request (not ftp) then enforce the rules after.
! = not
^ = start
\. = the backslash is to avoid RegEx commands from getting confused by using a normal ‘.’ in the text.
RewriteRule = The server must follow this, even if you screw it up and break it.
(.*) = wildcard
$ = end
%{HTTP_HOST} = this is your domain.com. You don’t necessarily have to use this code if you have a strange setup with sites in subfolders that for some odd reason don’t have their own htaccess files. You can use the actual domain.com here, if you want.
$1 = This catches the END of the 1ST part of the line. In other words, the ^(.*) or “anything” gets passed on to the end of the URL again. Without this, everything non-www redirects to the homepage, and that’s not what you want.
[R=301,L] = Makes clear that the redirect is a 301 status, and the L is almost like saying “last” and stops this rule from applying down the rest of your file. It saves you from redirect loops.
Short form:
Allow this to work.
If request is http, if it’s not www, do the rule.
The rule is to keep the stuff after .tld/, make it www, and stop the rule from applying elsewhere.
Forcing non-WWW
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.domain\.com [NC]
RewriteRule ^(.*)$ http://domain.com/$1 [L,R=301]
[NC] – this “flag” means no case. In other words cApItAliZaTiOn doesn’t matter.
Short form:
Allow this to work.
If request is http, if it’s not starting with domain.com, ignore capitalizations, do the rule.
The rule is to keep the stuff after .tld/, make the URL look as described, and stop the rule from applying elsewhere.
Redirect /index.php Access
We as SEOs know /index.html or /index.php should redirect to root. Otherwise it’s duplicate content. After Options +FollowSymLinks, place the following code:
RewriteEngine on
RewriteCond %{THE_REQUEST} ^.*/index.php
RewriteRule ^(.*)index.php$ http://%{HTTP_HOST}/$1 [R=301,L]
Short form:
Allow this to work.
If there is a request to /index.php, then follow the rule.
The rule is to take index.php and redirect it to the domain.
Force Trailing Slash
SEOs don’t want 200 status codes available at both .tld/here and .tld/here/. As a general rule, we like to use trailing slash. Use this code:
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]
Explaining these elements gets a bit tricky. If you really want to understand what is happening here, look for regex tutorials and interactive examples.
Advanced
We don’t find ourselves using these too often, but they have their place.
Blocking Googlebot
If you have a CDN for images, and Googlebot is still crawling your original uploads folder when it shouldn’t be, then follow these steps:
1. Navigate to /uploads/ or whatever folder should not be crawled in Transmit.
2. Create a new .htaccess file.
3. Place the code below.
Warning: do not put in the root. Googlebot will never be able to crawl your site.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot-Image
RewriteRule ^ - [F=403]
Elements:
%{HTTP_USER_AGENT} – Define incoming user agent, if what follows then condition is met.
[OR] – Or. Required for multiple conditions to be in place at once.
^ = Start, again. You should know this one by now.
[F=403] – Forbidden, 403 code specifically.
Changing File Extensions
If you have .html extensions and want to make them .php, just use the following code:
RewriteEngine on
RewriteRule ^(.*)\.html $1\.php [R=301,L]
Long form:
RewriteEngine on – Turn on the ability for rewriting
RewriteRule – Here’s the rule in two parts, first part is if user types this in, second is then actually run this on the server.
^(.*)\.html – Starts with anything and ends with .html. Let’s break this one down to be clear:
^ – start
(.*) – wildcard
\. – “.” because htaccess doesn’t know how to handle .’s without it.
html – the file extension we want to have the server change. If you have .htm in your links, use that instead.
Then:
$1\.php – Take the first part of what was just defined, and put it in front of .php. Again, to be clear:
$1 – first element, meaning the wildcard.
\. – “.” again.
php – the file extension that actually exists on your server.
A variation could also be used to remove file extensions completely:
RewriteRule (.*)\.html $1 [R=301,L]
Remember, a RewriteRule will load the file but not redirect it for the user. We want the redirection to occur for SEO purposes.
Further Geekery
Perishable Press has a great guide for extra tricks you might need, defining just about everything that can go on with htaccess. It is a great resource.