Penguin 2.1 Webinar Summary

The Zoo

Google made more changes in the last three years than their first decade of existence. They have essentially declared a war on SEO, and are doing a great job at messing with our heads. When Hummingbird was first announced, the only thing we noticed was the ~~scraped Wikipedia and other big sources~~ “knowledge graph” being added into the organic search results data for queries like diabetes symptoms. No real ranking changes. Most SEOs laughed it off with memes like this.

Interestingly enough, the original “knowledge graph” is still seen in the top right corner for that query. So why are there two pieces of information? They are two separate machines running. It all goes back to Kevin Bacon. Seriously.

Some smart-ass Google engineers wanted to see how many connections from Kevin Bacon another star was. So if you Google “Justin Bieber Bacon Number” you will see he has a score of 2. Google launched this Bacon number last year as they were experimenting how much they could determine the relevance between two topics. Much like Wikipediarace.com, Google tries to figure out how closely something is related (relevance) to each other.

So why did it take so long for Penguin 2.1 to come out with the technological ability in place a year ago? It’s a great question. They have been using TrustRank for a while (which seems to ignore NoFollow), along with heavy co-citation relevance to help them better utilize their link graph. But they completely sucked at relevance. Links from sites about country bands could push diet coupons as long as they didn’t have a penalty and anchor text was rotated (our bucket theory). Honestly, we speculated their old Caffeine engine couldn’t handle relevance on a mass scale.

Think about Caffeine’s ability in 2010 including:

Rapid indexing
Processing power for duplicate content filtering which was later built into the algorithm itself
Ability to limit anchor text power
Autocomplete
Etc.

Caffeine’s Limitations

But Google pushed the envelope so far that the engine couldn’t handle any new shiny things. They could only turn up or down dials on what percentages were allowed. In fact, they pushed Panda too far and had to back it off – there are only 2,500 commonly used words in a language, and once they realized that was the problem a lot of innocent sites actually recovered without doing anything different. Penguin devalued overuse of the same anchor texts, so all SEOs just changed their anchors.

Think about it. What more could Google do to keep clean results?

Google Plus was a complete failure for the social graph. Bing won that race with Facebook’s OpenGraph data.
Google Chrome’s datastream is decent, great to find cloakers, but few people actually seem to bookmark with it. The data it provides isn’t really useful to determine power, relevance, etc.
PostRank acquisition lost all of its hype and relied on public tweets (easily manipulatable). It also lost Facebook data. Further, it gave way too much credit to Diigo bookmarks and the total number of comments left on the site itself. Google still uses it as a social indicator but scaled it back when they realized how little accurate data they actually obtained.
QDF was used and abused (hence our Fresh PubDate plugin).

Google needed relevance. It was their last straw. They had no way to do it with Caffeine. Everything they did to bring “better results” was quickly bypassed. Hummingbird gave them the capacity to handle such a large task, and Penguin 2.1 (likely coded a long time ago) then attacked relevance.

New Theory

Previously with Penguin we focused on the “bucket theory” of anchors to catch juice. Today, we are introducing BaconRank Theory: a relevance score. The closer the relevance, the more PageRank/TrustRank/BaconRank is passed. This will be added to our Penguin section soon.

Spam On

So how do we manipulate this one? If you followed our grey hat outreach guide and kept things moderately related to your niche, you likely weren’t hit. Continue buying links as usual. If you were buying links anywhere that would sell them to you, now is the time to stop. They have to be relevant to your niche, otherwise they won’t help you.

A hot trend right now is to buy expired domains using tools like RegisterCompass, repurpose them entirely to be about your niche with unique content, and throw contextual links in them to the main site, which are also likely expired. Previously we were buying domains, making them look better than the original sites and keeping them on the original topic they once were, and adding in our links at random. This no longer works and we are working to change our strategy. We haven’t been advising members to go buy domains because of issues like this, and regular outreach works better. Now that the cards have been dealt, it’s okay to start buying.

It is too soon to determine specifics of what the key factors are to making something relevant. We can speculate about titles, site themes, TrustRank based on topic, etc. But the reality is we don’t know and we may never have enough data to make a decision or take action on it. Just try to get on page-relevant articles for now. From a logical perspective, it doesn’t make sense to be limiting to only site-specific themes. Large news sites and personal blogs would not fit this description as they cover so many random topics.

Commercial Intent

A few engineers have been using the phrase “commercial intent” when speaking of Penguin. We are reaching out to Christoph at Link Research Tools to find a way to determine commercial intent using their toolset for better analysis of this. So far, we haven’t seen evidence of this. But we know it’s coming and Google has the data to do so. A new best practice going forward that we will be adding to our Penguin section is a new rotation of anchor text:

Brand (and variations including Inc. INC, etc.)
URL (http://, www, trailing slash)
Phrase-based keywords (almost a sentence)
Actual keywords (still rotated as before)
Generic (we already have a massive listed of generic keywords posted)

CTR Manipulation & Data Center Variations: Rank Tracking Nightmares

Faking clicks in Google search was something we tested in 2012 and it ultimately failed. We heard reports of it working again, and tried it. It seems to have worked for our data center. Jumping from #9 to #1 in a weekend post-Penguin with no other modifications is unreal. Google knows it needs social data to determine “what is best” – but CTR should not be the method. We encourage you to test this on your own by searching your main keyword and clicking on your result from various IPs (local) and devices. We have 40% confidence in this being a strong factor moving forward.

The real problem this presents is results are vastly different across the entire US. You can be #1 in North Carolina and #6 in California for a normal non-location based keyword. It makes no sense. For local marketers, this is great news as you can easily fake your CTR. It seems that Google currently has no way of sharing CTR statistics between data centers.

NoCache, NoPanda?

This is on our radar. When Panda was rolled into the algorithm so it didn’t have to be manually updated, it always runs with the cache date of the URL. In other words, you can’t do the Panda CP test on an article until it is cached (indexed alone isn’t enough).

We had proof and later received confirmation from an engineer that applying noindex to a duplicate article does relieve the Panda penalties. But now the question is, what about nocache? If Panda makes the decision of duplicate content based on the cache data, and you aren’t in the cache, how can you be penalized?

We will be testing this. Feel free to test it yourselves on non-money sites.