Scraper Sites

What is a scraper site? It is a Web site that steals content, images, blogs and news for its own monetary benefit and creates a Web site of its own with no thought of copyright infringement. Since the inception of Google AdSense and pay per click advertising, scraper sites have been significantly on the rise. Their main purpose is to attract traffic and have users click on the scraper site’s paid advertisements, which in turn generates money for the owners/creators of the scraper sites. Most of the snippets of content that appear on these Web sites have no value or use to the user. They are strictly created for revenue and are a form of spam. Besides stealing the content without giving credit where credit is due, scraping a Web site for information also skews the results of the Search Engine Results Pages (SERPs). Search engines really do want to return the most relevant results when a query is made, but scraper sites basically pollute the search results.

Generally, scraper sites are created specifically to put paid ads on sites such as Google AdSense. Many times the ads on the page are the only relevant item on the page for the user, as the links to all of the other sites are not what the user was searching for. Thus, the user clicks on the Google AdSense text link or affiliate link and the scraper yields a monetary return. Scrapers create these sites based on listings with high-traffic snippets that contain keywords with high activity. These listings bring traffic to the scraper site, which in turn redirects the user to the site that the scraper has put together. The content that is stolen usually has a high activity count and is not necessarily relevant to the ad, but since the activity is high, the scraper site brings in the traffic.

So how do the scraper sites accomplish scraping a site? There are automated programs available that send out bots to extract content, reorganize it and create a new Web page. Some of these bots can scour thousands of sites in as little as an hour. Some of the software applications are programmed to implement “find and replace” for keywords so the content might appear different. Most scraper sites give the appearance of a search engine results page or a directory. Scrapers usually take sections of a Web site such as the title and description of a page or might even take a whole page and create a Web site or Web page of their own. As mentioned previously, the information provided on a scraper site is not original and has little to no value to the user.

The issues that scraper sites create are varied. Relevancy of all other sites for the query is diluted. Having the scraper sites show up in the rankings can push down the relevant sites’ positions. There are even instances when the scraper sites are ranking higher then the original site the content was taken from. Another issue is the scraper site can take traffic away from the original Web site. Scraper sites also create instances of duplicate content. And one of the biggest issues with scraper sites is copyright violation. Some refer to it as “digital plagiarism”. It is against the law to use someone else’s work, whether it be content, images, blogs, etc., without the owner’s permission and without citing who really created the work. Most owners don’t mind the use of their work for personal reasons, but copying content for monetary gain violates copyright laws.

If your site is prone to being scraped, below are a few suggestions that you might find helpful so at the very least you benefit or receive credit for your own work.

Use absolute links in your site.
On blog entries, make sure you link to other entries within your site.
Try to have a link to another page within your site in the first two lines of your text.
Use your blog name in your entry.
Use copyrights on your pages. Most scraper sites do not consider copyrights when they copy content.
Report any one stealing your work to the search engines.