Google Search, Google Panda, and the Sandbox Effect
By shibashake
The Google search algorithm is what most of us are familiar with.
In the past, Google's search algorithm weighted webpages mostly based on Page Rank. Over time, people have figured out how to self generate links in bulk, and artificially raise their site PageRank. This has sometimes resulted in warped search results, that may not always reflect quality from a user's point of view.
Since then, Google's search algorithm has continuously evolved to combat newer and more sophisticated black hat optimization practices. According to Google,
Today we use more than 200 signals, including PageRank, to order websites, and we update these algorithms on a weekly basis.
Google Panda is a new algorithm introduced on February 24th 2011.
- Unlike the regular search algorithm, Panda uses machine learning techniques, that are based on human ratings and input.
- In addition, Panda evaluates entire websites, rather than considering content on a page by page basis.
- Sites that are considered low quality by Panda, get a penalty in the regular search algorithm. This results in a lower ordering in the SERPs (Search Engine Results Page).
The effect of Panda is usually very dramatic because it targets the entire site (or sub-domain), which causes all pages within the site to lose ranking. In contrast, changes to the Google search algorithm may lower the ordering of some pages, and raise others. Therefore, drops in traffic are smaller, and more limited in scope.
Do the think the Google Sandbox or Sandbox Effect exists?
See results without votingGoogle Panda vs. Google Sandbox
Recently, there has been a revival in the discussion of the Sandbox Effect. However, it is interesting to note that many of the top contributors on the Google Webmaster forums, think that the Google Sandbox is just a myth.
What is the Google Sandbox?
As I understand it, the Google Sandbox refers to the penalty, primarily applied to new sites, for engaging in questionable link building practices. When a site, especially a new site, gets many links very quickly, there is a high likelihood that many of those links may artificially derived, rather than naturally acquired through referrals by visitors.
When we consider both the Sandbox Effect and the recent Panda Effect, there seem to be many similarities -
- Both exist outside of the normal search algorithm. However, they may place an external penalty on certain sites, that will lower their search rankings.
- Both make site-wide based decisions, and place penalties on the entire site or sub-domain.
- Both look at the gestalt view of a site, which is different from the Google search algorithm, which focuses more on single page statistics.
Does the Sandbox Effect exist?
My guess is that something did exist before Panda that tried to see the forest for the trees, i.e. discern overall site patterns from the details of individual pages. However, whatever that something was, it would long have been absorbed by what is now Google Panda.
Miraculous Recovery
Some people report site recoveries from the Google Sandbox after a certain period of time. This has created some speculation of a Sandbox time-limit.
However, there are many reasons why a site may recover.
- Google may have made changes to its Search, Panda, or Sandbox algorithms.
- The site administrator may have made changes to the content and structure of the site.
- Users may be interacting with the site differently, or there may be shifts in the popularity of site topics.
Sites are never static. Users post new comments, and recommend pages to their friends, contributors write more articles, and administrators update their sites. In addition, Google is constantly changing which factors they attend to while ordering sites and webpages.
When a site miraculously recovers, it is likely due to changes from Google, changes from site users, or changes made to the site itself.
What Is Google Panda?
It seems that Google's traditional search algorithm is mostly based on a fixed set of rules and signals. For example, a webpage with a higher page rank and higher view duration, may get a higher added score. New rules and signals may be added by human administrators, but the system does not grow on its own.
In contrast, Panda is based on machine learning techniques, which are adaptive in nature. A simple way to think of machine learning systems, is to consider how our human brain works. Different parts of the brain are responsible for its own specialized function, and may operate on various sensory inputs. These parts can also interact with each other and work together to form a common solution or answer. For example, we may accurately recognize faces, solve the Rubik's Cube, or quickly differentiate a spammy website from a high quality website.
To create a machine learning system, we must also define similar key components -
- What are the different parts, what are their individual functions, and what inputs do they operate on.
- How do the different parts interact with each other.
- How does the system produce an integrated answer or end result.
- How does the system change/learn based on experience.
Properties of Machine Learning Systems
- These systems try to generalize behavior based on a much smaller initial training set. In Panda, the initial training set could be a set of websites, together with user assigned rankings of those sites.
- Behaviors are evolved based on empirical data. In particular, the system automatically adjusts itself, as it analyzes more sites, and receives feedback on its results. For example, new pathways of interaction are formed, or different parts of the system gain and lose importance in framing the end result. These changes allow the machine to make better predictions in the future, based on its collected body of experience.
- Because machine learning algorithms evolve over time, its structure and the importance of various characteristics are always in flux. For complex systems, such as Panda, these changes may not be fully predictable, even by its creators. In this way, machine learning systems are much more difficult to subvert through traditional search engine optimization (SEO) techniques.
In essence, there are no rules to figure out, because the rules are always changing, and not necessarily in ways that make sense to a human observer.
How Many Amazon Ads Are Too Many?
A common Panda topic is how advertisements affect the score of a site.
- Will 5 or more Amazon ads get a particular page classified as a sales page?
- Will a 30% ad to text density cause a page to be classified as low quality?
- Will having 20% or more sales pages on a site cause it to get a Panda penalty?
Not surprisingly, Google is not providing any clear answers to these questions. However, given the changing nature of machine learning systems, it is probable that Google cannot provide clear answers even if they wanted to.
I got the sense from the Wired interview and other writings that even Amit and Matt were a little nervous about how this works. I think they recognized that they hit some sites unintentionally. The most frustrating part for them is that they don’t know why the algorithm hit sites they didn’t want to.
What to Do After a Panda Hit?
HubPages was able to offset some of the effects of Panda by switching each author to his/her own sub-domain. However, this is a very specific fix, for a large site with many content authors. In addition, sub-domains do not address the larger issue of what to do when our own site, or our own HubPages sub-domain has been hit by Panda.
So what can we do after getting hit by a Google Panda penalty?
- Learn from our competitors.
To get out of the Panda box... Look at the long tail content in your industry that is performing well purely on the basis of its good content.
- Optimize based on users. Since the Panda machine learning algorithm is targeted at finding good user content, we will not stray far from the path of success, if we also optimize our site structure and content for our target audience. Here is Google's list of what it considers to be important in building a high-quality site.
- Get a better idea of Google's algorithms by following their Google Panda related patents.
Join the HubPages community and start earning money online.
Interesting article. It's notable that even Google does not know why certain sites lose traffic and why others don't. You are confirming that it really is personal. Thanks for this.
Also, I loved the layout and the design of your hub. Those images are amazing and really add to the content of the hub. It gives it a sci-fi feel. Nice work!
Yeah machine learning systems are very interesting and also a bit scary at the same time. Today Google Panda, tomorrow the Terminator! :D
Another great article! I found it very interesting that "even Amit and Matt recognized that they hit some sites unintentionally" and that they didn’t know why the algorithm hit sites they didn’t want to LOL
Hello Mala, Thanks for dropping by and glad you found the information to be helpful.
Thank you very much snakeslane. I am so glad that you enjoy the artwork. I have a lot of fun with it, and it really helps me to de-stress. Writing is usually a lot more difficult for me than art. :D
very informative and useful hub.
Hello Shibashake, I appreciate the insight into inner workings of mysterious entity Panda and the connection you make with the real or imagined sandbox. Thanks for unravelling some of that. Once again, I am absolutely blown away by your artwork. Regards, snakeslane
shibashake 2 months ago
"You are confirming that it really is not personal."
Yeah that is a very good point. It is difficult not to take things like this personally, since writing is very personal, but I now try to be a lot more Zen about it.
From a sci-fi standpoint, Panda is actually a very interesting innovation. :D It is the first large scale machine learning system that I know of, that also has access to so much information as well as people feedback. It could really evolve into something really interesting ...