INTERNATIONAL: SEO Cloaking Techniques to Avoid in 2011
by Richard Bedford, Bruce Clay Australia, January 18, 2011
Head of Google Web Spam, Matt Cutts, took time away from Ozzie and Emmy (The Matt Cutts "Catts") at the end of 2010 to post a little titbit for webmasters and SEOs via Twitter, which I'm sure added to the hangovers for a few Black Hats over the holiday season. Google will [look] more at cloaking in Q1 2011. Not just, page content matters; avoid different headers/redirects to Googlebot instead of users. Cloaking is the technique used to present different content, layout, functionality or headers (a completely different page or partial components of the page, known as Mosaic cloaking) to a search engine spider than to a user's Web browser. Ethical cloaking is not "black hat" however, in the past spammers have used methods to manipulate cloaking techniques, for clarity let's refer to it as cloaking-spam, to game the (Google) algorithm. This is not a new phenomenon. In the beginning, the meta keywords tag was abused by spammers and as a result is now no longer a ranking factor and the <noscript> tag may also be treated with some suspicion as it has also been abused in the past (perhaps we should open a refuge for abused HTML elements....) First off, let me say, that if at all possible, AVOID CLOAKING. Cloaking is a high-risk exercise that, if it must be implemented, should be done so in the appropriate ethical manner, adhering to Google's Webmaster Guidelines, to ensure that your website is not penalised or dropped from the index. Unfortunately, some webmasters may not understand the repercussions, and inadvertently cloak content, links or entire websites without even realising. This article outlines some of the common on-site functionality that may be (mis)interpreted as cloaking-spam. Keep in mind that Google are actively investigating instances of cloaking-spam and banning websites from their index. They are also following up detection of cloaking and unnatural links with notifications to webmasters via Webmaster Tools. Google is now getting better and better at detecting cloaking spam algorithmically, even IP-delivery is not infallible and of course, Google always encourages your competition to use the spam report if they detect something fishy about your page. How could Google detect cloaking-spam?Identifying cloaking-spam algorithmically requires a search engine to compare a single web page obtained via two or more mechanisms (for example, two or more IP ranges, User-agent identifiers or differing levels of HTML/ JavaScript functionality). Microsoft has a patent filed late 2006 claiming a system that facilitates the detection of a cloaked web page. Naturally, this leads to the question, how could a search engine gather and analyse the two examples of a web page for comparison? Some methods may include:
Of course, the data gathering could be outsourced to a separate company to avoid the issue of IP-delivery Ethical uses of cloaking - Ensure you implement in an appropriate fashion for SEOThere are instances where a company may wish to provide different or additional information to its users. For example:
Ensure that you consider the SEO implications when using any of the methods or functionality mentioned above, as mis-configuration may result in cloaking-spam or may not be optimal for SEO. Cloaking - Don't try this at homeOkay, so this is not a tutorial on how to cloak; it is a "2011 cloaking-spam no-no list" or at the very least, a heads up of techniques to avoid or issues to fix early on in 2011. Some forms of cloaking are deliberate (such as IP delivery or user agent cloaking) however, many forms of cloaking-spam may be accidental. The accidental types of cloaking-spam that inadvertently get you banned from Google are of upmost concern, as webmaster may not be aware of the issue. Even large companies get it wrong sometimes. We will investigate some of the most common cloaking-spam techniques below in order to educate and ensure that webmasters and SEOs can make sure that they do not have them on their website. There are typically three ways that webmasters cloak content from either users or search engines:
IP-delivery/ cloakingDelivering different content based on the IP address of the requesting web browser or search engine spider. [IP Delivery is covered in more detail here.] Reverse DNS & forward DNSReverse DNS and forward DNS lookups are not a form of cloaking but may be used to query the DNS records of a requesting IP address. Google provides details on how to verify Googlebot is who it claims to be. User-agentDelivering different content based on the User-agent of the requesting web browser or search engine spider. For example, Googlebot/2.1 (+http://www.google.com/bot.html) or Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US) JavaScript redirectsGoogle may index a page containing JavaScript but may not follow the JavaScript redirect, however we are seeing significant improvements in Google's interpretation of JavaScript code (For example, the Google preview generator renders JavaScript, AJAX, CSS3, frames, and iframes). Webmasters sometimes use JavaScript redirects when they cannot implement a server side redirect, inadvertently leaving Googlebot on the first page and sending the web browser (which follows the JavaScript redirect) to a second page containing different content, and is thus flagged as cloaking-spam. Look out for the following code: <script type="text/javascript"> window.location="http://www.yoursite.com/second-page.html" </script> Meta refreshA tag added to the head section in the HTML page to redirect users to another page after a set period. The meta refresh tag is not considered cloaking when used on its own however it may be combined with JavaScript, frames or other techniques to send a user to a different page to the search engine spiders. Look out for the following code: <meta http-equiv="refresh" content="0;url=http://www.yoursite.com/second-page.html"> Double/ Multiple meta refreshes or referrer cloakingMultiple meta refreshes may be used to hide the referrer from affiliate websites. Avoid chaining multiple redirects of any kind, as it may have negative impacts on SEO and may even be against the terms of service (TOS) of your affiliate partners Meta refresh in JavaScript or the <noscript> tagOK, now we are getting into the realms of "black hat". It is unlikely that a webmaster would combine a meta refresh with JavaScript unless they were up to no good. This is easy for a search engine to detect. Don't do it. Back to back multiple redirectsSearch engines may not follow multiple chained redirects (per guidelines in the HTML spec the recommended number was set at 5 redirections). Google may follow around 5 chained redirects. Web browsers may follow more. Multiple back-to-back redirects (especially combining different types of redirects 301, 302, meta refresh, JavaScript etc) impacts page load times, may impact the flow of PageRank (even 301 redirects may see some PageRank decay) and could be considered cloaking-spam. I could not find any data about how many redirects a web browser will follow so I created a quick chained-redirect script to test some of the browsers installed on my machine and provide some stats on the approximate number of redirects followed (by redirect type). I limited the script to a maximum of 5000 chained redirects.
As the script was written, we thought we would run an additional test and submit the redirect URL to Google. We also linked to the script from Twitter. The results are in the table below.
Although Googlebot only crawled 5 of the permanent redirects in this instance, it may be fair to assume that Google may implement a crawl based verification to test redirects beyond the 5 redirection bot limit in a similar vein to Microsoft above who follow approximately 25 chained redirects. Note: we assumed that this is a Microsoft owned IP based on the IP Whois information from Domain Tools. FramesFrames allow a webmaster to embed another document within an HTML page. Search engines have not traditionally been good at attributing the framed content to the parent page enabling a webmaster to prevent search engines from seeing some or all of the content on a page.Frames and iFrames are legitimate HTML elements (even though they are not often best practice from an SEO point of view) however, they can also be combined with other techniques to deceive users. Frames with a JavaScript RedirectEmbedding a frame with a JavaScript redirect may leave search engine spiders at the first page and sneakily redirect users with JavaScript enabled to the second "hidden" page. I can't think of a legitimate "white hat" reason why you would choose to use this. It may result in a penalty or a ban. Check the source code of your framed documents, remove this code or implement an appropriate SEO friendly redirect. <noscript> tagThe <noscript> tag was designed to provide a non-JavaScript equivalent for JavaScript content so that text only browsers and search engines could interpret more advanced forms of content. The <noscript> tag may be treated with some suspicion as it has been abused by spammers in the past. Build JavaScript/ AJAX functionality with progressive enhancement in mind so that the content is suitable for all users and doesn't require the use of the <noscript> tag. If your website uses the <noscript> tag and you cannot update the code, check to ensure that any text, links and images within the <noscript> tag accurately describe the JavaScript, AJAX or Flash content it represents in an accurate, clear and concise manner. If the offending page or website has indexation issues, consider revising the <noscript> code as part of a thorough website SEO audit. Content Delivery Networks - CDNContent Delivery Networks (CDNs) allow companies to distribute their static content across multiple geographic locations to improve performance for end users. Depending upon the CDN configuration there are multiple ways to route the client request to the best available source to serve the content. CDNs are a complex area, usually implemented by global companies who need to serve users content in the quickest possible time. If you are using a CDN, ensure that it allows a search engine to access the same content and information that users' see and ensure that there is nothing that a search engine could misinterpret as deceptive. Hacked websitesHackers have used exploits on common CMS's to drive traffic to less than ethical third party websites. One example is the WordPress Pharma Hack which used cloaking to present pharmaceutical related content to the search engines but hide that content from the webmaster. Ensure that your CMS, web server and operating system software is running the latest versions and that they have been secured. Some of the most common exploits are poor passwords, unsecure software or scripts, disgruntled employees and social engineering tricks. Cloaking http headersHTTP headers send additional information about the requested page to the search engine spider or web browser. For example, the status of the page, cached/ expiry information, redirect information etc. Sending different headers to a search engine in order to deceive may result in a penalty. For example, replacing good content on a high ranking page with a sign-up form and altering the expires and/ or cache control headers in an attempt to fool search engines into maintaining the high-ranking version with the good content will not work. Googlebot may periodically download the content regardless of the expires and cache control headers to verify that the content has indeed not changed. You can check the status of your server response headers using one of our free SEO tools. Doorway pagesTo quote Google: "Doorway pages are typically large sets of poor-quality pages where each page is optimized for a specific keyword or phrase. In many cases, doorway pages are written to rank for a particular phrase and then funnel users to a single destination" Matt Cutts has a rant about Doorway pages here. Multi-variate testing and Google Website OptimizerMulti-variate testing tools such as Google Website Optimizer allow you to improve the effectiveness of your website by testing changes to your website content and design to improve conversions rates (or other important metrics measured). Multi-variate testing is an ethical use of cloaking however, Google states: "if we find a site running a single non-original combination at 100% for a number of months, or if a site's original page is loaded with keywords that don't relate to the combinations being shown to visitors, we may remove that site from our index". 301 redirecting old domains to unrelated websitesNot necessarily cloaking-spam per se, but a bait and switch technique, which 301 redirects unrelated domains (usually domains that are for sale or have expired but still have PageRank or significant external links) to a malicious or unrelated domain about a completely different topic. This is misleading to users as they may be expecting a different website and may pass unrelated anchor text to your domain. Also, don't expect credit for registering expired domains with external links in the hope of a PR or link boost. FlashHistorically, search engines have struggled to interpret and index Flash content effectively, but they are getting better all of the time. Webmasters had to consider users and search engines that did not have Flash enabled browsers and either built a standard HTML website "behind the scenes" for search engines, used a <noscript> tag, JavaScript or similar method to get their textual content indexed. Unfortunately, this may be inadvertently identified as cloaking by search engines if the content indexed from the Flash content does not match the textual content. Building an entire website in Flash is still not a good idea from an SEO perspective however if you do have some Flash content consider implementing SWFObject or a similar technique to ensure that Flash degrades gracefully for both users and search engines. Interstitial adverts and popover divsPopover divs and adverts alone are not cloaking. When the interstitial ads or popover divs cannot be closed (for example, unless the user registers) then you may be presenting content to the search engines and a sign-up form to your users. Ensure that users can close or skip an interstitial adverts, pop-ups, popovers, overlaid divs, light boxes etc and view the content available AJAXAJAX (Asynchronous JavaScript And XML) is a form of JavaScript that enables a webpage to retrieve dynamic content from a server without reloading a page. It has become very popular over the last couple of years and is often (over) used in many Web 2.0 applications. AJAX can be used in a deceptive way to present different content to a user and a search engine - Don't. In addition, the other side of the coin, in a "negative cloaking" approach the user may see the content but a search engine will not as it cannot execute the JavaScript calls that retrieve the dynamic content from the server. Something to check. JavaScript + Cookies to cloakMany of the techniques outlined in this article may be combined, chopped about or manipulated in a futile attempt to cheat the search engines. One such example is combining JavaScript and Cookies to cloak content. If the JavaScript function cannot write or read a cookie (such as a search engine spider), then display different content than to a standard user with cookies enabled. There are also a few JQuery script examples that will allow an unscrupulous person to do this. Link cloaking (vanity URLs combined with redirects)Link cloaking refers to sending a user to a different URL than the one clicked on using a redirect of some form. Redirects can be used for good and bad as we have seen above. Link cloaking is often used for analytical or maintenance purposes. There are a number of practical reasons to do this, for example:
Of course, this may be used to mislead and deceive, such as disguising an affiliate link (e.g. replacing the link with http://mysite.com/vanity-url and redirecting that to http://affiliate.com/offer.html?=my-affiliate-code). Link hijackingModifying the anchor text or link attributes with JavaScript or a similar mechanism to trick or deceive users. This is a form of cloaking that only modifies a small component of the page to deceive a user.
Avoid link hijacking to deceive users as it may result in search engine penalties or get your website banned. There are ethical forms of this technique to ensure that both users and search engines can see your AJAX content using HiJAX as recommended on the Google blog. Hidden TextHiding text is against Google's TOS and Webmaster Guidelines. It is a form of cloaking as a search engine can see the textual content but a user cannot. Avoid the following types of hidden text:
Cloak this!If search engine traffic is important to you, make sure you consider the following with respect to cloaking:
|
|
For permission to reprint or reuse any materials, please contact us. To learn more about our authors, please visit the Bruce Clay Authors page. Copyright © 2011 Bruce Clay, Inc.