It’s happened to anyone who’s ever used a search engine:
- First, you type a keyword into the search box and hit Enter.
- Next, you pick a promising link from the list and hit Enter again.
- Then, you find yourself looking at a page filled with gibberish – words are spliced together on the page in a way that has rendered them nonsensical.
What IS that about, anyway?
What Is Content Scraping?
In the simplest sense, scraping is any technique wherein website content is re-purposed. Sometimes reuse is useful: Websites that allow you to compare the prices of identical online products or the weather forecasts in two prospective vacation locales employ a form of content scraping in which the captured data is analyzed in a central database before it’s republished. A 2010 report from the Fair Syndication Consortium found that, on average, over 75,000 unlicensed Internet sites scrape US newspaper content over a typical 30-day period.
Is Content Scraping Legal?
The main question when it comes to content scraping is undoubtedly this:
Is content scraping legal? Can I sue if someone steals my website content without my permission?
Unfortunately, the answer isn’t simple.
Copyrighted Content v. Duplicated Facts
While copyright protects original expression, courts have found that protection does not extend to duplicated facts. Nonetheless in 2004, Southwest Airlines sued travel reservation websites FareChase and Outtask to prevent what Southwest described as unauthorized access to its website when they published Southwest fares and flight information as part of their service offerings. The case – destined for the Supreme Court – has yet to be decided. In the meantime, Southwest Airlines travel information continues to be notably absent from discount Internet travel sites.
Autoblogs & Content Scraping
Often, content scraping is part of an elaborate ruse, designed to earn commissions for “legally ambiguous” affiliate marketers. Autoblogs are website templates fitted with plug-ins that automatically compile and update content from RSS feeds. This strategy is perfectly legal, so long as content is scraped from sites like Ezine Articles, Associated Content, and other article directories that permit – even encourage – syndication.
Frequently, though, content may be scraped from copyrighted sources – like your blog!
Are there steps you can take to prevent your copyrighted content from being scraped? Sure.
- Photographs and other graphics can be watermarked;
- Written content can be posted in non-copyable .pdf format (which is a horrible idea, because search engine spiders that decide where your site ranks do not “read” .pdfs.)
Grin or Sue?
Practically speaking, the easiest thing to do when it comes to content scraping is to grin and bear it. If you’re writing on the Internet in the 21st century, sooner or later you can expect to run across your own misappropriated words one a site to which you didn’t grant permission.
That said, you can always formally copyright your blog or website — and sue for online copyright infringement if anybody steals your $#!+!
The Legalities of Scraping Confidential Information
Scraping is also the term used when user-generated content is lifted and analyzed as part of a data mining operation.
In May 2010, a password-protected website called PatientsLikeMe.com — a safe online space for over 70,000 users who are bipolar or dealing with other mental health challenges — caught BuzzMetrics, a subsidiary of prominent media research Nielsen Co., in the act of downloading every message that had ever been posted to PatientsLikeMe’s online forum. Why? Nielsen intended to aggregate that information and sell it to drug companies anxious for patient feedback about their products.
Is scraping information from password-protected sites illegal? It more than likely violates the Terms of Service, but courts have been inconsistent as to whether or not Terms of Service agreements constitute binding contracts. The company that operates PatientsLikeMe also sells the information it harvests to drug companies. As such, user information might be considered legally protected intellectual property. Additionally, US courts have upheld claims brought by Internet companies arguing that unauthorized use of their servers by robo-scrapers constitutes a specialized violation of personal property rights known as “trespass to chattels.”
Consult Attorney With Content Scraping Experience
Is your content scraping legal? For more information about content scraping, contact a qualified copyright attorney.