A remarkable feat was achieved by Nadav Harari, Head of SEO at VentureKite. Drawing assistance from a popular AI chatbot and a generous serving of Python programming, he birthed a tool harnessing Hugging Face’s prowess for semantically dissecting topics in the realm of search engine optimization. This represents an instance of real-world machine learning implementation in the SEO domain.
The most heartening aspect of this endeavor is that he is generously sharing his innovation with the world. A tête-à-tête with Harari about his invention is documented below, wherein he expounds on the tool’s essence and its mechanics.
For those inclined towards written instructions outlining the setup and utilization of our SEO instrument, these guidelines have been thoughtfully included as well.
Summarized Exchange on Our Free AI SEO Tool
In a conversation between Jim Markus, the website portfolio manager at VentureKite, and Nadav Harari, the head of SEO at VentureKite, the limelight is directed towards a tool Harari personally crafted. With a blend of ChatGPT’s insights and Harari’s coding prowess, this creation emerged from the ground up.
Markus: I am Jim Markus, entrusted with managing the website portfolio at VentureKite, and I’m joined by…
Harari: I’m Nadav Harari, the SEO head at VentureKite.
Markus: Essentially, this tool offers our audience a window into your invention, which was born from scratch and nurtured by the synergy between ChatGPT and your coding skills.
Harari: Gratitude, Jim. My primary aim was to devise a process that would allow me to conduct content gap analysis. I wanted to move beyond the keyword-level analysis typically offered by well-known SEO tools like Ahrefs and SEMrush. I aimed for a semantic level of analysis.
So, envision this – you input a few titles from a competitor’s website. Leveraging a machine learning model specialized for this purpose, you can determine whether you’ve already covered the topic under scrutiny. Furthermore, it provides a similarity score between zero and one, aiding you in gauging whether you’ve addressed the topic adequately.
The model at the heart of this process was trained over a dataset of more than a billion pairs, enabling it to grasp context, intention, synonyms, plurals, and other linguistic intricacies. This, in turn, facilitates a higher level of content gap analysis. This approach replaces the manual process many of us rely on, which involves searching Google with the “site:yourdomain.com” query and examining the search results to check for the presence of a topic on your website.
Markus: You’re absolutely right. Presently, prevalent tools center around keyword analysis. However, as search algorithms grow more sophisticated and Google delves deeper into content understanding, the focus shifts to topics rather than mere keywords. Your creation addresses this shift admirably.
Harari: Indeed, Jim. Soon, I’ll demonstrate how to operate this tool for your advantage. I’d like to underscore that existing SEO tools mainly focus on keyword comparisons and rankings.
Imagine you published an article just yesterday, and Google hasn’t had the chance to crawl, index, or rank it. The conventional SEO tool might flag this as a non-ranking keyword. But with the method I’ll show you, you can input a list of URLs from your site map and compare them against a competitor’s topics. This approach is not reliant on ranking data, making it much more versatile.
Markus: Precisely. This approach circumvents the waiting game tied to ranking data. Instead, it capitalizes on your website’s structure and its immediate availability, regardless of ranking status.
Harari: Exactly, Jim. Let me walk you through the process. We have two main components here: a spreadsheet that you can duplicate and a Python script that leverages the machine learning model I showcased earlier.
Markus: Outstanding. Just to reiterate for clarity, in case there’s someone unfamiliar with the details, the similarity scores in columns F and beyond are rated on a scale of zero to one. If the score falls below 0.45, the field is left blank. This mechanism aids in identifying content gaps effectively.
Harari: Absolutely, these scores offer valuable insights. I’d also like to show you another interesting use case. Until now, we’ve focused on content gap analysis against competitors. But what if you could perform an internal content gap analysis among your own URLs, focusing on the same topics?
For instance, let’s consider row seven. We have titles like “10 best web development frameworks” and “best PHP frameworks for web development.” These are quite similar and could be seen as a content cluster. Similarly, we can cluster “best PHP frameworks,” “top PHP alternatives,” and “best certifications.” The scores indicate their semantic proximity.
Markus: You’re delving into a multi-faceted application here. Beyond competitor analysis, this tool can also identify potential cannibalization of your own traffic and help spot content clusters for consolidation.
Harari: Indeed, Jim. I encourage everyone to give this a shot. Duplicate the spreadsheet, set up a Google Colab account, paste in the provided Python script, and generate your own credentials using Google Cloud. This can be employed on your website or for assessing competitor sites. Dip your toes in and observe its impact firsthand.
Markus: That’s a brilliant suggestion. The tools are readily available for exploration, and I’m sure many will find them incredibly beneficial. We’ll provide comprehensive instructions for those who prefer step-by-step written guidance.
Harari: Absolutely, Jim. I’m here to help. Feel free to explore and reach out if you need assistance.
How to Utilize Our Complimentary SEO Tool: A Detailed Walkthrough
Below, we offer a step-by-step guide on setting up and operating our SEO tool. For a visual tutorial, don’t hesitate to watch our accompanying video.
Unpacking the Semantic Content Gap Analysis
The essence of our semantic content gap analysis stems from Hugging Face’s pre-trained machine learning model, which was honed on an expansive dataset encompassing over a billion sentence pairs. This model empowers you to match each competitor’s article title against every single article title on your website. This process generates the top 50 most semantically akin titles (with scores ranging from 0 to 1) in descending order, all in relation to a given competitor title.
A screenshot demonstrates the model’s adeptness on Hugging Face’s platform. It recognizes synonyms and context, attributing higher scores to related terms.
Reasoning behind this approach:
Presently, if you wish to determine whether you’ve already covered your competitor’s topics on your website, you resort to the “[site:yourdomain.com ‘topic’]” query on Google and manually browse through results. Alternatively, you might utilize the content gap feature in tools like Ahrefs or SEMrush.
Why these methods fall short:
The “[site:]” operator may overlook recently published articles that haven’t been crawled, indexed, or ranked yet.
Using Google’s “[site:]” operator is cumbersome and impractical when dealing with a large number of competitor topics.
Ahrefs or SEMrush’s content gap feature identifies competitor keywords that you don’t rank for, overlooking the broader topic context. For example, it might flag “Best table tennis paddle” as missing, even if you’ve published content about “Best ping pong paddle.”
Prerequisites and Exemplary Outcome
Prerequisites:
Duplicate this Google Sheet.
Create a Google Colab account and paste the Python script I devised with the assistance of ChatGPT.
Column A contains competitor titles, and the columns to the right harbor the most closely aligned titles on your site.
For instance, Cells B2 and C2 show the most analogous titles on “golfspan.com” in relation to the competitor title in A2. B3 and C3 do the same for A3, and so on.
Setting Up & Operating the Google Sheet
Google Sheet1 Tab:
Enter competitor URLs in Column A and include ALL your website’s URLs in Column F. You can copy these from your XML sitemap.
Utilize the ChatGPT-generated AppsScript to populate Columns B and C (competitor) and Columns G and F (your website) with the status codes and titles of any given URL. Even URLs redirecting with a 3XX code or returning a 404 error are handled to provide meaningful titles.
To ensure raw topic/title comparison without external influences, a formula in Column D and Column I cleanses titles of brand names and HTML entities. Column A and Column B in Sheet2 will automatically mirror this cleaned data.
Download the Google sheet as an XLSX file.
Open the XLSX file and click “Enable editing.”
Proceed to the Sheet2 tab, copy Columns A and B, and paste them as values. This step is essential for the Python script’s proper execution.
Sheet2 Tab:
This is where the Python script enters the stage. It compares each title in Column A (competitor titles) against all titles in Column B (your titles) and populates results in Column D and adjacent columns. This process covers up to 50 similar titles on your site.
For instance, after the script’s execution, Column D contains competitor titles, and Column E showcases all titles from your site. The columns to the right display competitor titles in descending similarity order.
Python Script for Streamlined Functionality
Access the Python script here.
Approximately 30 seconds after initiating the code, you’ll be prompted to upload the XLSX file from the previous step. Following this, the script proceeds to fill your XLSX file with data, specifically in the Sheet2 tab.
This code:
Facilitates XLSX file upload.
Matches each competitor title (Column A) against ALL titles on your site (Column B). It fills Column D with competitor titles and, up to 50 titles with a score above 0.45, in Columns E and beyond in Sheet2. Note that lower scores imply lower similarity and are excluded from retrieval.
The script disregards identical titles (similarity score = 1), as these are unnecessary for internal content gap analysis.
Closing Remarks
We believe this AI-driven SEO tool brings substantial value to your content gap analysis endeavors. Should you have any queries regarding its usage, don’t hesitate to reach out to the tool’s creator, Nadav Harari: [email protected].