SMX West 2012: Duplication, Aggregation, Syndication, Affiliates, Scraping & Info Architecture

SMX West logoHere we are at #smx #25C.

Moderator: Vanessa Fox, Contributing Editor, Search Engine Land (@vanessafox)


Rudy De La Garza, Manager, Corporate Search Engine Optimization,
Gabe Gayhart, SEO Manager, PriceGrabber / Experian Interactive (@datguygabe)
Nick Roshon, SEO Strategist, iCrossing (@nickroshon)
Kent Yunk, VP and SEO Strategist, Roaring Pajamas (@kymktg)

Nick starts the session. Your website is probably the most important content your business has. You want it to be unique, and the biggest reason for this is probably because of search engines. If you have products, don’t just put up the boilerplate manufacturer descriptions. Look at Amazon. They have boilerplate product descriptions, but then many pages worth of unique content.

Augment boilerplate with:

  • user reviews
  • related products
  • trust builders (return policies, why buy from us, about this brand, coupons, discounts and bundles, pricing policy, nearest stores, email notifications)
  • photos, videos and other assets

Avoiding Duplicate URLs

1. Don’t do it.

  • One product=one URL
  • Dynamically refresh content to displya different pricing info and images

2. But if you have to do it…

  • Use consistent parameters (or better yet, subfolders) and block via robots.txt and Webmaster Tools to block them out.
  • use canonical tag
  • Add an HTML link to the canonical version of the page from the duplicate page.

3. Watch for pagination

Global best practices for search engines:

  • Use a subfolder or subdomain
  • set geotargeting in bing and Google WMT
  • Do localized link building
  • Use the hreflang tag to specify alternate languages

Global SEO and UX:

  • Put country name in page title/header/footer/template
  • Spell things correctly for the Brits and Canucks
Gabe Gayhart
Photo courtesy Luis Angel, @luisangelec

Next is Gabe. He’s the rock paper scissors champion and has a belt to prove it! He doesn’t have many opportunities to show it off so he brought it to show off.

He’s going to establish a plan of attack, process, implementing and executing. His company, PriceGrabber, aggregates merchant feeds. 26 channel focused subdomains broke out the site. Massive amounts of pagination.

When Panda hit their traffic was dropping. The elephant in the conference room is always Panda. Some top performing keywords went down and they had to adapt. How do you Panda Plan? They split it into 4 categories.


  • structure
  • kw usage
  • tracking parameters
  • folder depth
  • redirects

KW optimization:

  • target based on search volume
  • map to targets and meta data
  • competitiveness

design optimization:

  • optimizing templates
  • on-site link opportunities
  • readable text
  • layout

Existing content optimization:

  • targeting kws and linking in silos
  • leveraging feeds and back end data
  • title tags
  • kw usage in text

Plan as if Google is a dog wanting a biscuit. Google wants to serve the right piece of content. Make the clear distinction of what the priority content is. Plan to kill duplication when possible. Have a redirect strategy (or rel=canonical). If syndication is creating duplication, then support duplicated priority pages with a sound internal linking structure. otherwise, plan for your content to be de-emphasized.

Kent worked at Ask for a while. Q&A is something that can drive traffic. An enormous quantity of long-tail phrases covering a vast array of subjects. A group of Q&A targeting popular subjects that attract more users who need answers.

Aggregating Content

Data acquisition options:

  • build a large list of questions
  • ask an existing community for questions
  • buy a list
  • pay for content (answers)

Free form vs. moderated questions:

  • free form is a challenge because users love to tell their story
  • moderated questions improve kw but take significant resources


  • wide range of med topics
  • interact with experts
  • provides value and gets content
  • trackers and tools
  • partners with institutions to pulish anonymous data

There’s a layer of community based content. Add to that expert content layer. This is great robust content. Then they created trackers and tools – track blood pressure on a daily basis, for instance – and rather then capturing a new users personal data (name, address), but they capture health type data (age, race) and anonymize the data and feed it to research centers. It’s an intersting way to build on what was basically a forum.

Example: Ask Answers

  • Taken from a large repository of queries.
  • Normalized for category, popularity, and likely to monetize
  • Invites answers from online community

Version tracking:

  • How many similar versions of the same question before you get a penalty?
  • Always tied to a unique answer
  • Multiple domains
  • Using avatars and personas
  • Unique titles, h1s and descriptions

Text analysis tools: text comparisons to find and eliminate dupe content

Rudy takes it home with his presentation on content strategy and info architecture. He in-house SEO for

IA is a science of deciding what you want your site to do, then making a plan before marketing, editorial, product and design efforts. Content strategy identifies audiences, products and delivery mechanisms and creates intended targets. Content strategy and IA efforts offer influence in the organization. Why do we do this?

We want to execute well, but what’s that mean? If you can successfully execute these things you end up garnering influence. You’ll have more influence during decision discussions.

Content strategy: identify audience: define biz goals, identify key audiences, bring marketing, editorial, product and design folks toether and a C level.

Creating personas:

  • categorization of people with: problems, tasks, goals
  • cluster topics: view competitive research, social media discussions, Google Insights/Trends, notes from conversations

Finding the discussions

  • DoubleClick Ad Planner
  • Alexa
  • Compete
  • Google Discussions

When you’re talking about who you think your audience is, bring in someone from the paid side. You can define a cool audience type, but it turns out it’s not a converting segment. Your PPC peple will know this.

Personas are descriptions of people. What kwyord phrases are associated? What assets do we have to help? What do we want Carl to do? Why will he do it?

Know the universe:

  • Identify assets – page types: content types like infographics, tools, resources
  • Find the issues that come up and plan for them (for instance apply Google Rich Snippet code. Identify where to put in a Google+ link. Do these things as they come up.

Eventually you want to be able to get to a point where you can make sense of the universe you’re providing your persona. He shows a visiodiagram of tax content buckets and their relationships. This gives product people info on prioritization. When you find kws you want to go for you’ll also find nearby terms. Put that into Google Discussions and you’ll find what people are asking about this in recent history. Content folks can write targeted content about these topics.

Finally there’s a wireframe for the design team:

  • Determine page functions, SEO items.
  • Identify where you talk to your audience.
  • Identify business goals.

Then, be available for support, offer amendments, explain areas for creativity and then you gather influence.

Virginia Nussey is the director of content marketing at MobileMonkey. Prior to joining this startup in 2018, Virginia was the operations and content manager at Bruce Clay Inc., having joined the company in 2008 as a writer and blogger.

See Virginia's author page for links to connect on social media.

Comments (0)
Filed under: SEO — Tags: ,
Still on the hunt for actionable tips and insights? Each of these recent SEO posts is better than the last!
Bruce Clay on April 3, 2024
How Can I Improve My Website Rankings Through SEO?
Bruce Clay on April 2, 2024
What Is SEO?
Bruce Clay on March 28, 2024
Google’s Explosive March Updates: What I Think


Your email address will not be published. Required fields are marked *

Serving North America based in the Los Angeles Metropolitan Area
Bruce Clay, Inc. | PO Box 1338 | Moorpark CA, 93020
Voice: 1-805-517-1900 | Toll Free: 1-866-517-1900 | Fax: 1-805-517-1919