SEO analysis in eCommerce with the use of Big Data. Key ranking factors.

Search engines develop their algorithms using huge financial and technological resources. For several years now, to rank results Google engineers have been using elements of artificial intelligence, which are hidden under the name RankBrain.

How does RankBrain influence positioning strategies?

  1. The use of Machine Learning allows Google to choose different ranking algorithm depending on the thematic category, the intention of the query, language, location or device.
  2. The ranking factors chosen by RankBrain for a given query are not always intuitive or logical.
  3. The SEO expert’s knowledge and experience should be supported by data analysis.

Using the “Reverse Engineering” approach, i.e. analyzing search results and distinguishing the features of highest ranked results, we can build precise SEO strategies.

Below we present a fragment of such an analysis for ranking of factors in the e-commerce industry.

Analysis stages


  1. The first stage of the analysis is to create a set of thousands of phrases that cover a given subject matter.
  2. Next, for these phrases we download the top 30 search results as well as the source code of all pages in these results.
  3. In the next stage, the collected data is verified and, if necessary, cleaned before further analysis.
  4. In the next stage, we analyze more than 100 potential ranking factors, looking for the most important ones.
  5. In the final stage, we analyze the individual characteristics of a given client and build a set of detailed recommendations specifically designed for this client.


Details of the Analysis

  • We decided to study two categories: online drugstores and computer stores.
  • For each category we selected 10 thousand phrases which by definition represented a purchase intention (e.g. product name).
  • In this study, we decided to narrow down the results to online stores in order to understand how to be the highest ranked among online stores. We did not want the conclusion of the analysis to be you should be like Ceneo or”
  • We investigated onsite factors (including technical aspects, content, site structure) and external linking (using data from the Majestic tool).
  • In the course of the study we analyzed over 64 GB of source code data.


Selected analysis results

Explanatory notes on the charts:

Continuous blue line – median or average value of a given factor, depending on the position of computer stores.
Blue dashed line – median of values for subpages from domain which appeared in the ranking during the study.
Blue Shadow – space between the 40th and 60th percentile of value for a given factor, depending on the position of computer stores.

Continuous red line – median or average value of a given factor depending on the position of the drugstore.
Red dashed line – median of values for subpages from domain which appeared in the ranking during the study.
Red shadow – the space between the 40th and 60th percentile of the value for a given factor depending on the position of the drugstore.


Incoming links

Fig 1. Median of the number of linking domains – follow type (Majestic data)


Fig 2. Median of Citation Flow indicator – domain level (Majestic data)

As you can see, the more domains that link to the store, the better the chances of a high position in search engine results. The same is true in case of the Majestic Citation Flow indicator for the whole domain. Undoubtedly, the computer category is more competitive than the drugstore category and the number of linking domains characterising the leaders is much higher.Interestingly, both most visible (leading) stores in both categories examined by us have high values for both indicators. They are undoubtedly strong domains in terms of linking. This could suggest that their low positions in the top3 store visibility ranking may be due to Onsite factors. More conclusions can be drawn from the following graphs:

Fig 3. Median of Citation Flow indicator – subpage level (source: Majestic)


When we go down with Citation Flow analysis to the level of subpages in search results, iit is clear that both stores are already well below the median for the highest positions. How to interpret this? The reason may be that “Link Juice” (strength of the domain) does not appropriately “spill” over the internal structure of the stores to the level of product pages and subcategories. An additional hint is given in the next graph:

Fig 4. Median of the ratio of incoming links to subpages found (Majestic indicator)


We can see here that the number of external links should be appropriate to the size of the site. In the case of Hebe and Sferis online stores, these proportions are insufficient.


Page optimization for phrases

Another area that we have studied is how to match the content of the subpage to the query. In our analysis we took into account whether the phrase (the whole string of characters) is located in particular elements of the page (meta tags, headers, content). The intuition and experience of SEO experts suggest that the use of the keyword in the most important elements of the site (Title, H1) increases the chances of a high position. In the case of our study, it looked like this:

Fig 5. Percentage of results containing a key phrase in Title for individual items.



Fig 6. Percentage of results containing a key phrase in H1 for individual items.


The first surprise is the very low level of pages containing a so-called “Exact match” in the Title tag and H1 header. For drugstores it is over a dozenpercent, but in the case of computer stores the share is below 10%!

In addition, in the case of drugstores an exact match to a query has a negative relation to the positions occupied, i.e. in the leading positions there were more frequently pages that did not contain phrases in the exact form.

What was the situation in the case of the keyword in the content of the website? Were there significant differences between short (one or two words) and long (3 or more words) phrases? Write to us at to receive the full version of the study.

Use of images

The use of attractive and detailed product images is a potentially important element that influences the purchasing decisions of customers of online stores. One popular SEO recommendation is to enrich the site with multimedia elements. So, we checked how the quantity of photos correlates with positions in search results.

Fig 7. Median of the number of photos on the site.


Drugstore websites with more photos more often appeared in the top positions. In the case of computer stores, this relationship is not so unambiguous and the “shadow”, i.e. the width of distribution, is much wider.

Of course, the number of images in the code of a site does not in any way inform us how many images of the product itself have been placed on the site. A high number of images may be the result of a long list of similar products or manufacturer’s logos in the store menu. Let us also look at another parameter related to photos:

Fig 8. The average ratio of number of words to the number of photos on a given subpage.


This graph shows that a certain proportion between the length of text and the number of photos can increase the probability of high positions in one industry and decrease the probability of high positions in another.

Length of content

Without hesitation I can call 2018 the year of Content in SEO. Many sites and online stores focused on enriching their pages with new and unique content. Were the more developed websites more likely to be ranked in high positions?

Fig 9. Median of the number of words on the site.

We can see that in the case of computer stores, sites richer in content were more often in the top positions. For internet drugstores this dependence is not so clear.

In our study we verified more than 100 SEO factors and their relations to ranking. Write to us at if you would like to know more about the following:

  • How does the lack of SSL affect the visibility of an online store?
  • How much content should be on the website in the case of a drugstore?
  • What is the importance of using tags?
  • What is the most effective URL structure for stores?
  • Are the websites of stores with shorter title tags ranked higher?
  • How does the placement of nofollow outgoing links on the website correlate with the ranking?
  • How many internal links do websites ranked first have?
  • Is it good to have a certain proportion of nofollow links among the links leading to your domain?
  • Are stores that offer free delivery ranked higher?

Prediction models

Our analysis confirms that there is no single website factor or element that determines ranking. The distributions of all factors for particular positions were often wide, e.g. the first position could be achieved by having 100 images on the website or not having any images at all. The presented results are only a simplified visualization of the data. To build recommendations we use much more advanced analyses (e.g. studying similarities of distributions) as well as machine learning to build prediction models on the basis of which we create a set of precise recommendations for our clients.

The effectiveness of SEO strategies based on Big Data and Reverse Engineering analysis.

We have been using the Data Driven SEO approach for our clients for over 2 years. During this time, we have carried out many campaigns with spectacular effects. Below are just 3 examples:

Case Study #1

More than 6 million organic visits per month for a new website within 12 months of launch.


Case Study #2

More than 1 million organic sessions for a new online store within 2 years of launch.

Case Study #3

100% increase in online store traffic within 2 years.


The presented results are only a small part of our approach to SEO which brings strong growth to our clients every year. It is already clear that without the support of Big Data and Machine Learning even the best positioners will soon be ineffective.