This is the long awaited CDN (Content Delivery Network) tutorial for WordPress. For many, just hearing the term CDN creates a level of stress because of all the unknowns. This tutorial will fix that. As you know we have been closely monitoring duplicate images over many months in relation to “duplicate content penalties” in Google. One of the biggest factors in this penalty was represented by “the green line” in Google Webmaster Tools. Google, for whatever reason, removed this metric from their tools, but not before we were able to determine what exactly what went into this metric that was truly a site killer in the SERPs.
Through our exhaustive tests, we were successful in creating massive drops in this green line before Google removed it. One technique that helped decrease this line the most over any other change was the introduction of a CDN and ‘removing’ or hiding the images on the ranking domain from Googlebot.
Summary of Issue: Google now has the ability to determine if the images you are using on your site are also used on another sites and they are the “owner” of the image according to Google. The penalty in the SERPs is the same as if you had scraped content, although not as severe, but would hamper a site to keep it out of the Top 10.
Summary of Solution: By moving your images to a CDN, either to a separate domain or a sub-domain, you are creating a new “spot” for your images to be located. A place Google has never indexed before and knows nothing about. The reason this is important is if you just add the CDN to where the images are now, Google already knows about them and the penalty won’t be removed. However, if they are moved to a new location and you lock it down through .htacess so Google cannot index the domain or sub-domain whatsoever, you have now solved the issue. No more “duplicate” or “substantially similar” images.
Okay, let’s get into the details on how this works. There are many ways in which you can configure your CDN. This tutorial will try to cover everything you need to get started. Advanced tweaks and customization will not be covered.
Preface:
A. Add this line to your .htaccess file:
Options -Indexes
We have repeatedly told our members to do this, and we still run across this in our site reviews. Do it right now, don’t forget as this is a key component to the entire process working correctly. Alternatively, you could use Better WP Security plugin and disable directory indexing that way.
B. CloudFlare vs. CloudFront
People (even us) often get these confused and interchange the words by accident. CloudFlare is a reverse proxy that hosts your entire domain on a cloud. From a code and crawling perspective, it looks about the same as a normal domain. Having duplicate images on CloudFlare is no different than having it on your normal domain.
CloudFront is powered by Amazon Web Services. You can use the ugly domain CloudFront provides to your account, or you can set it up to be pretty as we will highlight below. Or it can be masked through a subdomain on your domain like cdn.yourdomain.com. This only hosts your static content. While it is possible to host an entire website with Amazon Web Services, we do not recommend it.
1. Use W3TC
We have been long-time fans of W3TC. It is the most advanced caching plugin, and also the most difficult to configure.
If you are happy with your W3TC speed and stability, then move onto the next step.
Do not attempt doing both (installing W3TC and the CDN) in one day unless you have a lot of experience at diagnosing errors. If you run into an error tomorrow from settings you changed today you will have a hard time figuring out whether you have a problem with caching or with the CDN, or both. Sometimes errors take a while to pop up because of how the caching works. Be smart about this and reduce your stress level, an extra day or two in setup is not going to kill you.
What could go wrong?
For example, if you have an automated backup plugin that saves all changes of your server on the same server (a bad security practice anyway), and W3TC is caching those changes, then you create an infinite loop that will get your hosting account shut down.
2. Sign up for Amazon Web Services here: http://aws.amazon.com
3. Once you are signed in, there is a piece of data you will need to access on this page: https://portal.aws.amazon.com/gp/aws/securityCredentials
If you have an old account, predating 2005, you will probably see two access keys here. Pick the one with the newer date. You will also need the hidden security key as well.
4. Enable CloudFront in W3TC on the General Settings Page
Example: yourdomain.com/wp-admin/admin.php?page=w3tc_general
Enable yours and select Amazon CloudFront for easy setup.
5. Decide whether to use a subdomain or an external domain.
We tend to like putting these files on an external domain, as many of the ‘big boys’ (aka Google Venture companies) do. The files will still be on a subdomain of the external domain. So something like cdn.yourcdndomain.com works. Go ahead and buy that domain now if you wish to use external. If you just want to use cdn.yourdomain.com – that’s fine too for now. It’s certainly less expensive if you plan to do this for feeders. We like to have all duplicate and potentially thin content somewhere totally offsite as a Panda insurance policy.
6. Create Bucket and Distribution
This is on yourdomain.com/wp-admin/admin.php?page=w3tc_cdn
A Simple Method, just follow the picture:
7. Configuring the CNAME (Step 7 in picture above)
The CNAME is where you integrate your external domain or subdomain. To get the ‘pretty CDN domain’ to work, you need to add a CNAME record on your registrar to point to leavethisalone.cloudfront.net (obviously replace it with what is listed in your settings based on the above picture).
8. Navigate to https://console.aws.amazon.com/ – Click on CloudFront
A. Click on the “i” box for the domain you are working on. Click “edit” under the general tab. Add the CNAME record you created here. In our example, it’s cdn.yourchoice.com.
B. Default Root Object – we just set it to “/” without quotes. This is so people can’t browse your entire directory of files on the CDN. It will create an error page instead saying it can’t find the key. If you want to upload an index.html file, go for it. We haven’t seen the need to fill this out entirely.
C. The other settings are your choice, we leave default.
(8.5 – add this info to the “Add CNAME” field on yourdomain.com/wp-admin/admin.php?page=w3tc_cdn )
9. Host Your Files
10. Be sure to actually click “Upload custom files” on the ones you select.
You can only do one upload section at a time. This is normally a one-time process until you need to update something like the theme or custom files. Make sure your designer knows about this. If you have a designer working on a theme a lot, you might want to go to the Advanced section and select “Don’t replace URLs for the following user roles” and choose “Administrator” so the designer can see what he’s doing before uploading changes to the CDN and waiting on that long propagation process.
11. Careful Selection
In the next steps we are going to block Googlebot from accessing the old files, so we want to make sure the right files are uploaded to the CDN. More importantly we want to make sure the wrong files aren’t uploaded. This is important for security reasons too, if you accidentally upload your custom plugins and leave your CDN open to browsing – someone just stole all of your stuff because of negligence.
12. Block Googlebot
Navigate to yourdomain.com/wp-content/uploads/ (where your images are most likely hosted) and add a new .htaccess file with the following code:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot-Image
RewriteRule ^ - [F=403]
Do the same for your themes folder. We do not recommend doing this for plugins yet as so many have bugs and terrible rendering from Googlebot’s perspective.
13. Test as Googlebot
Log out and use a user agent spoofer in your web browser. For Chrome, we use this one: https://chrome.google.com/webstore/detail/user-agent-switcher/lkmofgnohbedopheiphabfhfjgkhfcgf?hl=en
14. Note any broken images or CSS for Googlebot and tweak those in the boxes highlighted in #11.
15. Test adding a new post with an image. Without any work, this should automatically use your pretty CDN domain when you check the image source. Log out if you used the Advanced Section to disable URL replacements for logged in users to see the change.
Congratulations, you are now using a CDN. This is also a nice upsell to web design clients as they like to hear phrases like “cloud hosting static files for faster worldwide distribution.”
If you monitor your log files, you will likely notice Googlebot slow down and eventually stop visits to your images and theme files. We also noticed it started crawling our actual content much more. Make no mistake, Googlebot is still crawling everything but now it is external to your domain.