Content-based ranking: Essential ways search engines calculate keyword relevance
The field of search engine optimisation is constantly changing and marketers and businesses are constantly striving to be abreast of these developments. There has been an argument from some quarters regarding the growth of topics (and intents) and the demise of keywords. Regardless of these debates, keywords are still at the heart of search engines and webmasters are there to ensure contents on websites are optimised for targeted keywords. With the advent of machine learning and advanced search algorithms such as Google’s RankBrain and Bing’s RankNet, search engines crawling, indexation and information retrieval have become a lot smarter. This is where content-based ranking comes to play. So what then is content-based ranking?
Without a content-based ranking system, search engines will return pages in relation to how they are crawled. Let’s assume you have a service page that contains a keyword: ‘sports science’ and you also have a customer testimonial page that contains a keyword instance of ‘sports science.’ Your aim is to have Google render the service page for a ‘sports science’ user’s query and not the testimonial page. Without a content based ranking system, Google will render pages according to how they are crawled. As such, if Google crawls your customer testimonial page first, they are more likely to render the page for a ‘sports science’ query as opposed to your service page. This ushers the need for a content-based ranking system, which is a process where search engines award web pages scores for given search terms. Search Engines return pages with the highest scores on these keywords. So how is this keyword relevance calculated?
Ways keyword relevance on a web page is calculated:
The scoring methods most search engines adopt in calculating the keyword significance on web pages are:
1) Word Frequency: The first-way search engines determine the core keyword on a given web page is the number of times it appears in a given web copy. Search engines are now more intelligent than they were in the past. A few years ago, many website owners got away with keyword stuffing or the unnatural repetition of keywords. This keyword approach worked a few years ago but right now, semantic search algorithms like Google Hummingbird have led to a more robust way of calculating word frequency. Td-idf (Frequency-Inverse document frequency), is a popular term-weighing method used by search engines to assign weights to a term based on the frequency of occurrence. Search engines are adopting a sophisticated way of converting keywords to vectors (numbers). As such, synonyms and similar properties will be considered as the same word and attributed with the same vector or number. Example, I ran a search on a query ‘MacBook air maintenance’ and the top four results had pages with frequency of keywords related to Mac. Keywords such as ‘Mac maintenance,’ ‘Mac software updated,’ ‘Mac app store, ‘Mac maintenance’ and ‘Mac and OS X Install’ were found in the top pages. Search engines view word frequency from a semantic manner than a verbatim level. Assuming you have a page on MacBook maintenance that contains keywords such as the above, this can be viewed as meeting the word frequency criteria.
All of these related words will be assigned a similar vector or number. An example will be a number 2, for all Macbook related keywords. Assuming you’ve got two pages that contain Macbook related keywords, and Page A has 10 occurrences of Macbook related keywords and ends with a 20 score of relevance (2 x 10 ) and page B has 5 Mac related keywords with a relevance score of 10 (2 x 5). Google and most search engines will be more likely to render page A when Mac maintenance related searches are run. Contents with a great ranking are written from a topical, semantic and contextual level. Repeating the keyword ‘MacBook air maintenance’ on a page is futile and could be considered as stuffing. Crafting the contents on the page using closely related keywords is more productive.
2) Document Location: This is the second criteria utilised in calculating the keyword relevance on most Web pages. The location of your keywords determines how much weight or relevance you place on them. A quick look at the ‘MacBook air maintenance’, query reveals that most of the top pages have ‘Mac Maintenance’ related keywords in the title or first paragraph of the given article. When pages are indexed the title of the page and locations of keywords are recorded. This helps search algorithms score pages as a result of the location of the respective keywords
3) Word distance: For longer queries, the proximity or closeness of the keywords determines scoring. In computational linguistics, n-grams are used to categorise words, letters, syllables and phonemes. An n-gram of size 1, looks at a single word (unigram), size 2 (bigram) looks at two words and size 3 (trigram) captures three words. Search engine crawlers determine the size of n-grams that are to be added to the index of keywords. Going back to the MacBook example, it is quite clear that there is proximity between the keywords. The search query is ‘MacBook air maintenance’ and the content on one of the web pages indicates a closeness of the core words. One of such phrases in the content is: ‘MacBook needs the same regular care.’ The word ‘MacBook’ is quite close to the phrase ‘regular care.’ Regular care is synonymous to maintenance and Google’s crawler can index a much longer n-gram (words), hence this page ranks high for the search term.
Search engine crawlers and algorithms are more advanced in this day and age. Nonetheless, word frequency, document location and word distance are three good ways which determine the keyword relevance of a web page. The days of keyword stuffing and verbatim are gone as we are now in an era of semantics, topical and contextual relevance. This is not discounting other keyword relevance signals such as cross-linking and inbound linking. Suffice to say that these three elements determine the keyword relevance scoring to a large degree.