Like Me? Follow Me.
it's important, to some degree, to understand the various ways that search engines may index and understand the data on a webpage/website.
This week, SEOMoz has released a new tool to calculate LDA (Latent Dirichlet Allocation), a concept currently so obscure that I
couldn't find a proper definition of "dirichlet" (it is, it would seem, a scientist, a crater on the moon, and has something to do
with 'boundary conditions' relating to a mathematical equation).
For the purposes of IR and how search engines work, however, LDA refers to yet another way to understand what a webpage is about by calculating the probable topics covered on the page based on the words used and the relationships between those words. As SEOMoz explains, using the LDA model, theoretically you "can compute the similarity between any word or groups of words and the topics it's created."
LDA = Topical Relevance
What the SEOMoz research demonstrated, to a simplistic degree, is that far from looking just at repeated mentions of a keyword and a few synonyms, the Google algorithm is actually calculating the degree to which a page appears to be about a specific topic based both on the number of words on the page that the search engine identifies with that topic but also the number of words that the search engine identifies as being about RELATED topics - topics you would expect to find covered at the same time.
For example, if I produced a page of content about the new Grinderman album I might mention:
- Nick Cave
- The Bad Seeds
- garage rock
- Warren Ellis
- The Dirty Three
- Jim Sclavunos
- The Birthday Party
- Heathen Child
- music videos
- John Hillcoat
- laser beams
- Roman Centurions
You could then classify the words into topics in the following way:
- Grinderman - Nick Cave, Warren Ellis, garage rock, heathen child, guitars, gigs, Australia
- Nick Cave - Bad Seeds, Birthday Party, guitars, gigs, Australia
- Warren Ellis - Bad Seeds, Dirty Three, Australia, gigs
- Heathen Child - Grinderman, laser beams, Roman Centurions, music videos, John Hillcoat
You can then see that a page about "Grinderman" is also likely to cover topics relating to Nick Cave, to the Bad Seeds, to Australian bands, to Warren Ellis, etc. So a page that mentions Nick Cave, Warren Ellis and Heathen Child but doesn't mention Grinderman may be as relevant to a search for "Grinderman" as a page that mentions "Grinderman" 7 times but instead mentions "top 40" "album chart" "airplay" and "Rhianna" as it doesn't cover the same subset of topics - so even though both may be relevant to "Grinderman" the pages themselves indicate a very different searcher intent.
Why It's Important to Understand LDA
Whether Google is using LDA and how Google might be using LDA is less important perhaps than the fact that the research demonstrates yet another step by search engines to determine searcher intent and deliver the most relevant content - which should, quite rightly be based on the overarching topics or themes of the page rather than one specific keyword phrase. It's something which Google has demonstrated recently with changes to the blog ranking algorithm, according to a recent blog post on SEO by the Sea which essentially points out that Google is using a variety of new indicators to not only deliver relevant blog posts, but to try and understand the niches of blogs in order to also deliver a list of relevant blogs that regularly cover topics related to the searcher's query.
It's Still Links as Much as Content
I think, however, it may be premature to completely rethink the way we approach search campaigns based on the fact that in SEOMoz's testing, inbound links from different IP addresses appear to have less relevance than the LDA calculation.
Even today, when Google has outlawed paid links and has spent the best part of a decade pushing the notion that "content is king" to SEOs, it's blatantly obvious that content is only king if your competitors haven't found a way to game the link part of the algorithm.
Just do a search for 'leather jackets' in Google.co.uk and check out the backlink profile of the site at #1.
Low quality, paid, unrelevant links do appear to work in volume still. This is one example of many - we've seen again and again in the last 12 months sites shoot into top 5 listings for competitive keywords - and stay there - on the back of link building alone.
LDA is a Mirror into What Google Wants to Achieve
What this new metric may reveal is the best intentions of Google - or what should happen in an ideal search environment - it's
still quite clear that if Google is using it, it's not using it terribly effectively and it's still just one factor amongst many.
Focusing your content on delivering information that will help searchers, targeting keywords that searchers use when looking for the information on your pages and finding links from relevant sites back to your content is still the best formula for SEO success
and that's not likely to change any time soon.