Panda is one of the most misunderstood algorithms Google has ever created. It slaughtered tons of innocent sites. It destroyed content farms. It killed the 2008-2010 style feeders that worked so well.
But Panda is simple to find, and easy to fix.
Symptoms
- Sitewide ranking drop around Panda refresh (now a rolling update, so new Panda issues are harder to pinpoint)
- Traffic doesn’t just slow down, it completely falls off a cliff
- PR 0 pages become PR NA (use a bulk PR checker)
- The site is large, old, and lacks canonical links
- Indexed navigational pages (categories, tags, paginations, etc.)
- Site is driven by product or other data feeds that others use
- Heavy Use of Parameter URLs (those using ?variables)
Tests & Diagnosis
- Panda CP Test (Copy-Paste)
- Copyscape for large article sites
Common Misdiagnosis
- Penguin
- Clean Copyscape means it is a link issue.
Complications
- Copyscape: Just because there isn’t a direct copy of an article doesn’t mean there is not a Panda penalty. Substantially similar is the key to remember. More on that below.
- Sometimes a site is actually clean and takes a hit in rankings at a Panda update because it relied on a few backlinks for juice. If any of those backlinks received a Panda penalty, no juice will come from them.
Solutions
Kill all failing content. This is a top priority. For those who need rapid fixes, switch the subdomain from www to non-www after all offending content is removed.
If the site has a forum or other user-generated content section, it needs to be on a subdomain to minimize risk of content penalties to the ranking domain.
If the failing content has great links, rewrite it. This is high priority. Rewrite it from a new perspective. More on that below.
Kill most categories, all tags, archives, etc. Internal duplicate content is almost worse than external in some cases. Use Yoast’s WP SEO plugin to help clean up the mess.
Update: by “kill” we mean delete if possible. If you must have the duplicate content for legal or organizational reasons, you should NOINDEX it. You will find the NOINDEX option on Yoast’s advanced tab on individual posts/pages, and bulk NOINDEX for categories, tags, archives, media, etc. within the “Texts/Title” option. Learn about robots.txt vs NOINDEX tags here.
For large e-commerce sites relying on data feeds, you must adapt or die. You can noindex the products you don’t care about ranking to avoid penalty while preserving internal sales. You can also write new perspectives and reviews on the products that drive revenue and attempt to rank for those. Trying to compete with the massive retailers on a generic product is nearly impossible. You must have a new angle on the market. All things being equal, lower price alone will not get you better rankings than Amazon.
Deeper Understanding
To completely understand how link juice is redistributed in Panda, here are two well-written articles:
http://dejanseo.com.au/hijacked/
http://dejanseo.com.au/mind-blowing-hack/
How Substantially Similar Content Works
While most gurus preach about duplicate content, they completely ignore Google’s reiteration of substantially similar content in all of their guidelines. When we say to make something original, we mean you have to make that text say something new. Algorithmically it is very easy to figure out how to do this. While CopyScape has been a great tool in the past, it is failing miserably now as it doesn’t factor in “substantially similar.” Take a look at these examples. If you watched the old TV sitcom Friends, you probably remember Phoebe’s stupid songs…and you’ll pick up the reference:
Original
Smelly cat, smelly cat, what are they feeding you?
Let’s determine a hypothetical content quality score. Remove stop words like “what, are, you” and don’t count repetitive words. There are 3 main words here: smelly, cat, feeding. For simplicity’s sake, let’s just assume this is receives a score of 3.
Spinner
What are they feeding you, smelly cat?
This is unique by Copyscape standards. It’s not a direct copy, but it doesn’t say anything new. There are only 3 main words here: feeding, smelly, cat. But the original gets the credit for this text, so the canonical 3 points of juice goes to the original, and this copy gets 0. Any links from this page also gets 0 juice because this is how Google set up the system.
Spinner with Synonyms
Stinky cat, stinky cat, what have you eaten?
Again this will pass Copyscape. It almost says something new, but Google has a great idea of what synonyms are and what they actually say. Looking at the hypothetical point system, we have stinky, cat, and eaten. Let’s assume .5 for stinky because it’s a synonym, and .5 for eaten. This piece of content would get 1 point.
If you want to study a bit more about how Google handles synonyms, look at their related searches. Or sometimes if you search for “promo code”, “coupon code” will also highlight – meaning Google knows that coupon and promo are the same thing.
Mashing and True Rewriting
The fragile feline had a pungent odor so I asked it, “what are they feeding you?”
This is the magic of mashing or truly rewriting a piece of content. Getting an outsourcer to understand this concept can be difficult. It requires more vocabulary, knocking out the cheap labor you can buy in some countries. But content mashing (scraping from multiple sources on a sentence level) with this level of spinning makes it much more difficult for Panda to determine an original source to give credit. Take a look at the hypothetical points again: fragile (1), feline(.5), pungent(1), odor(1), asked(1), feeding(0). This example will pass a duplicate content test, and actually be considered original.
Mashing in itself combines so many different angles of subjects into a single article that Google has a difficult time figuring out which one can get the canonical juice. It can determine by paragraph, but a sentence-level mashing is algorithmically very difficult to beat considering we only have about ~2500 words that we consistently use in our language. At some point, everything would be considered ‘duplicate’ if they turn the sensitivity level up too much.
Another way Google works around the mashing issue is QDD: query deserves diversity. Some searches like ‘Jaguar’ pull entirely different types of content to ‘give the user a better experience’. Some will talk about an animal, some will talk about a car. Some are facts, some are places to buy or visit. This goes back to the Google EWOQ team determining what the users intent for a query could be: do, know, or go. It could be that you’re not competing for 10th, but competing for 4th just based on your type of content. If other spammers in the space are doing the same thing, do something different. Give your page a new angle. Focus on a different audience. Do what you have to do. You can’t just one-up the top spot and win each time. The angle is key.
So how do you apply this knowledge?
If you’re wondering why some comment spam blasts aren’t working the way they used to, consider it might be your repetitive comments themselves. If you’re wondering why web 2.0 blasts aren’t working, consider the quality of the spin. All of this traditional spam still works, especially article blasts if everything is substantially different.
Automation Spamming & Panda
You will have to spend time figuring out how to make your articles and comments mash and spin to be original. We are testing a few new tools that automatically handle this aspect, and we will let you know if they work over time. There is speculation with the tools that if too many people use it and scrape the same sources, that all of the articles will essentially mash the same way and create problems.
In the mean time, do work by hand and learn some patterns. If you haven’t updated your comment spintax in Scrapebox in years, now might be a good time to do so with this knowledge.
Real Human Writing & Rewriting
Don’t tell your writer to “write about heart disease in men” or some other generic keyword without any extra instruction. More than likely they will write in the third person perspective, and mash what Wikipedia and all other major health sources say to be factually accurate. To writers, being factually accurate and grammatically correct is the important part. For SEO, that means nothing. Even if you follow our linking practices, if the content is junk, it won’t rank and never will. The content is just that bad.
What you can do is tell your writer, “Write about heart disease in men from a wife’s perspective. Maybe include things she changed in her cooking, and how she started exercising to get him off the couch.” That’s the difference. It would say something new. It makes sense to exist. You can use this to combine keywords, too. Even though the keyword “heart disease in men” would be difficult to rank, it may serve some other purpose like picking up juice from a quality link on an expired heart disease domain you bought — things should be clicking for you right now. If not, go back and reread this section.