Digital Marketing

How to Use SEO Tools to Qualify Sites Before the Pitch (for Non-Link Builders)

As a self-taught SEO, I struggled (and failed) for years to understand how to build links to my site and the sites of my clients. I’ve built my agency on writing quality content that ranks in search engines and drives sales, but the one piece of the puzzle I was missing was how to build powerful links to that content.

Like most SEO consultants who don’t focus on link building as a central business offering, for a long time, the entire process of link building at scale seemed overwhelming, and every link building campaign I launched failed to generate the results I needed.

I would spend hours writing content, testing numerous tools to discover link opportunities, validating each site, and finally reaching out to site owners in a desperate attempt to secure high-quality backlinks. But nothing seemed to work, and as my success rates dropped, so did my confidence in myself as an SEO. 

It wasn’t until I started to look through my entire link building process that I realized I needed to spend more time qualifying sites to ensure I didn’t waste time on low-quality sites or irrelevant content.

Over the course of a few years, I slowly started to develop a system to help me discover, prospect, and secure powerful links for myself and my clients. This process was made around me being the only person doing the work, so I had to find ways to minimize wasted time or resources along the way.

A quick note for readers

I’m not a professional link builder, and I’ve found that this process to qualify potential sites works for me and my needs. This process is by no means optimal, and since link building is a powerful SEO tool, you should be sure to do a lot of research to determine the best approach for your specific needs. What works for me might not work for you, so, I highly recommend you look at resources like Moz’s Beginner’s Guide to Link Building, or pick up The Ultimate Guide To Link Building by Garrett French and Eric Ward.

So again, before we go through my qualifying process in the pre-pitch phase of link building, I just want to reiterate that this process is not perfect, it won’t work for all types of link building campaigns, and it will continue to be improved upon. I created this process based on my needs and goals, and it works on a few assumptions:

  • You are a solo or small team, and need to maximize your time throughout the process.

  • You are looking for broken link building and guest post opportunities. This will not work for local link building or other related strategies.

  • You have access to various tools like Moz, Ahrefs, and Majestic, and you know how to pull data from those resources.

  • You are more concerned with maximizing your time than you are about finding every site available.

With that said, I hope it helps other SEOs shave some time off their link building process and combine it with other approaches for the best results possible!

Qualification & audit in the pre-itch phase

No one will deny that link building is one of the most important pieces of any SEO strategy. While you may have an impeccable technical setup and the best content on the internet, the truth is that Google will not reward your efforts if you don’t have the types of links to your site that signal authority.

Since all link building boils down to outreach, I needed to have amazing content to offer the right people to land links from the right sites. Whether I was performing broken link building, resource page link building, or reaching out to powerful sites for guest posting, I needed to make sure I limited the amount of time and resources wasted on irrelevant sites.

The first step of any successful link building campaign is to make sure that you have the right content for the desired audience. At this point, let’s assume that you have a great piece of content that’s relevant for a long list of potential sites. For me, the most important aspect to consider is my time, so this is where pre-qualifying sites is crucial. I have to cut out as many sites as possible as quickly as possible, and focus on the sites out there with the best fit.

Step 1: Bulk disqualifications

Once you know that your content will solve a problem, you can run various footprints through a tool like Scrapebox, NinjaOutreach, or Pitchbox to develop a large group of potential sites to reach out to.

Depending on the industry and footprints used in the discovery phase, you might end up with a list of a few thousand potential sites. While it’s exciting to see that many, you can also lose a lot of time by reaching out to sites that are irrelevant or low-quality.

Disqualify various URL parameters

Before I look at metrics or other aspects of a site, I’ll prune my initial list of sites based on specific words in their URL that I think will yield poor results for my outreach efforts. I do this with simple commands in Excel or a Google Sheets document to search for and remove each row with a URL that includes footprints like “wiki”, “forum”, and “news”.

While this process isn’t perfect, I’ve found that these types of sites usually offer a low-quality link on a generic page buried deep in their content archive.

Remove blatant guest posting sites

Now that we removed sites with specific parameters in the URL, I like to remove sites that are obviously made for guest bloggers. While guest blogging has been a good strategy for me, sites that appear to be built around guest posts are usually unscrupulous sites that I don’t want a link from. While not always the case, I’ve found that these sites are likely part of a Private Blog Network (PBN) and could yield low impact for my link building efforts.

To prune out these types of sites, I will pre-qualify sites like I did in the previous step by taking out sites with “submit”, “write for us”, or “guest post” in the URL and move them to my “junk” spreadsheet that I keep and examine later on.

Step 2: Use tools to identify powerful sites

At this stage, I’ve removed quite a few sites from the initial list based on their URL. Now I can assume that the sites I have in my list aren’t trying to generate guest posts, and my efforts won’t result in a link buried deep within a wiki page.

It’s important to note that the exact metrics I consider acceptable will vary based on industry, client goals, and if I’m performing local link building campaigns vs. national outreach efforts. But to simplify things, I’ll use the general baseline with the metrics below when evaluating a typical client for authoritative outreach campaigns.

Obviously, not all sites are disqualified, but if a site has high metrics but upon further examination I find the site is low quality, then I know that site was only built for rankings and I will disqualify that site from my target list.

Majestic website metrics

The most important factor to consider in any outreach campaign is the topical relevance and authority of a site based on the industry that you’re working in. It’s important to ensure that all backlinks are relevant to the target page from a topical and contextual perspective.

Since topical authority and relevance are so important for outreach efforts, I run my list of sites through Majestic SEO so my spreadsheet of prospective sites are all related by topic and context to the piece of content I want to point links to.

Once I have a list of topically relevant sites, I will run that list through Majestic and only keep those sites that return CF/TF of 12 or above. I may adjust this baseline depending on the number of results, but I have found that sites with CF/TF below 12 tend to be weaker sites that won’t move the needle.

It should also be noted that I only keep sites where the CF and TF scores are at least 50% of each other. For example, I will not consider a site with CF 50, but a TF 10 score.

This step will whittle down my initial list and usually leave me with about 20-30% of it. I take all sites that aren’t relevant to the destination site and place them in a separate spreadsheet to review later.

Ahrefs website metrics

Now that I have a list of topically relevant sites that also meet a minimum threshold in Majestic SEO, I will move on to Ahrefs. I copy/paste the remaining sites into the Build Analyze tool to find sites with at least 500 monthly traffic and a DR of 15 or above.

This step helps me identify “real” sites that generate traffic before I manually review the site.

Moz website metrics

Finally, I take the list of sites that are topically relevant and have strong baseline metrics through the Moz Pro tool. Since I can’t justify the cost of Moz API for my small team and limited use case, I need to do URL checks manually at this stage, so it’s important to do everything I can in previous steps to ensure I only work with sites that show good potential.

I check my list of sites in Moz through their Link Research tool to understand the strength of a root domain and quickly identify any spam sites that might have survived previous steps. I also look at the Moz Spam Score to determine whether a site requires more manual review.

Depending on the scope of my link building campaign, the industry I’m targeting, and geographic region (among other factors), I usually only reach out to sites with a DA of 10 or above. I’ve found the Moz DA tool is pretty accurate when evaluating the “realness” factor of a site, and anything below a 10 DA is likely a PBN site.

My final step to evaluate a site through SEO tools is to look at the Spam Score to catch any leftover low-quality sites that may have passed the other checks:

Like most tools, you can get false positives, since it’s pretty easy to stand a site up just to generate “good” SEO metrics. For this reason, I like to take the final step of a manual review of websites before I reach out to website owners.

Step 3: Manual review

Now that I have a small list (usually 10-20% of the original list that I started with) of sites that meet benchmarks set in each tool, I’ll begin the manual process of reviewing the remaining sites.

I think it’s important to manually check sites before reaching out to them, because I can usually find sites that are part of a PBN or those sites that were built just to sell links based on their design and functionality.

As I review these sites, I keep an eye out for obvious signals of a poor site. I almost always disqualify a site at this stage that has excessive advertising on it, because I can assume the site is only built to increase their sales commissions and not the quality of content for real people.

Use SEO tools to save time during the link prospecting phase

No matter the scope of your outreach or the industry you work in, all outreach campaigns take a lot of time and resources. Most SEOs know that bad link building can result in a whole host of problems, and as the only person in our agency who performs outreach, I need to protect my time.

The balance between scalability, quality, and efficiency is made or broken during the prospecting phase of any link building campaign. I use various SEO tools to help me save time and determine the best sites for my outreach efforts. Not only does this stack of SEO tools help me identify those sites, it also means that I’m more likely to successfully communicate with a real person at a real site to build links with.

Feel free to test out this process for yourself, and I’d love your thoughts on how to improve it in the comments below!


Digital Marketing

Google Advanced Search Operators for Competitive Content Research

#kwd1 { border: 2px solid #53bce7; padding: 3px 12px }

The excitement of finishing a competitive keyword research project often gives way to the panic of fleeing from an avalanche of opportunities. Without an organizing principle, a spreadsheet full of keywords is a bottomless to-do list. It’s not enough to know what your competitors are ranking for — you need to know what content is powering those rankings and how you’re currently competing with that content. You need a blueprint to craft those keywords into a compelling structure.

Keyword research Google search with search operators.

Recently, I wrote a post about the current state of long-tail SEO. While I had an angle for the piece in mind, I also knew it was a topic Moz and others had covered many times. I needed to understand the competitive landscape and make sure I wasn’t cannibalizing our own content.

This post covers one method to perform that competitive content research, using Google’s advanced search operators. For simplicity’s sake, we’ll pare down the keyword research and start our journey with just one phrase: “long tail seo.”

Find your best content (site:)

long tail seo

“long tail seo”

First, what has Moz already published on the subject? By pairing your target keywords with the [site:] operator, you can search for matching content only on your own site. I usually start with a broad-match search, but if your target phrases are made up of common words, you could also use quotation marks and exact-match search. Here’s the first piece of content I see:

Google search result for long tail SEO search

Our best match on the subject is a Whiteboard Friday from five years ago. If I had nothing new to add to the subject and/or I was considering doing a video, this might end my journey. I don’t really want to compete with my own content that’s already performing well. In this case, I decide that I’ve got a fresh take, and I move forward.

Target a specific folder (inurl:)

long tail seo inurl:learn

long tail seo

For larger sites, you might want to focus on a specific section, like the blog, or in Moz’s case, our Learning Center. You have a couple of options here. You could use the [inurl:] operator with the folder name, but that may result in false alarms, like:




This may be useful, in some cases, but when you need to specifically focus on a sub-folder, just add that sub-folder to the [site:] operator. The handy thing about the [site:] operator is that anything left off is essentially a wild card, so [] will return anything in the /learn folder.

Find all competing pages (-site:)

long tail seo

Now that you have a sense of your own, currently-ranking content, you can start to dig into the competition. I like to start broad, simply using negative match [-site:] to remove my own site from the list. I get back something like this:

Google SERP for

This is great for a big-picture view, but you’re probably going to want to focus in on just a couple or a handful of known competitors. So, let’s narrow down the results …

Explore key competitors (site: OR site:)

long tail seo ( OR

By using the [OR] operator with [site:] and putting the result in parentheses, you can target a specific group of competitors. Now, I get back something like this:

Google SERP for

Is this really different than targeting one competitor at a time? Yes, in one important way: now I can see how these competitors rank against each other.

Explore related content #1 (-“phrase”)

long tail seo -“long tail seo”

As you get into longer, more targeted phrases, it’s possible to miss relevant or related content. Hopefully, you’ve done a thorough job of your initial keyword research, but it’s still worth checking for gaps. One approach I use is to search for your main phrase with broad match, but exclude the exact match phrase. This leaves results like:

Google SERP for long tail seo -

Just glancing at page one of results, I can see multiple mentions of “long tail keywords” (as well as “long-tail” with a hyphen), and other variants like “long tail keyword research” and “long tail organic traffic.” Even if you’ve turned these up in your initial keyword research, this combination of Google search operators gives you a quick way to cover a lot of variants and potentially relevant content.

Explore related content #2 (intext: -intitle:)

intext:”long tail seo” -intitle:”long tail seo”

Another handy trick is to use the [intext:] operator to target your phrase in the body of the content, but then use [-intitle:] to exclude results with the exact-match phrase in the title. While the results will overlap with the previous trick, you can sometimes turn up some interesting side discussions and related topics. Of course, you can also use [intitle:] to laser-target your search on content titles.

Find pages by dates (####..####)

long tail seo 2010..2015

In some cases, you might want to target your search on a date-range. You can combine the four-digit years with the range operator [..] to target a time period. Note that this will search for the years as numbers anywhere in the content. While the [daterange:] operator is theoretically your most precise option, it relies on Google being able to correctly identify the publication date of a piece, and I’ve found it difficult to use and a bit unpredictable. The range operator usually does the job.

Find top X lists (intitle:”#..#”)

intitle:”top 11..15″ long tail seo

This can get a little silly, but I just want to illustrate the power of combining operators. Let’s say you’re working on a top X list about long-tail SEO, but want to make sure there isn’t too much competition for the 11-15 item range you’re landing in. Using a combo of [intitle:] plus the range operator [..], you might get something like this:

Google SERP for intitle:

Note that operator combos can get weird, and results may vary depending on the order of the operators. Some operators can’t be used in combination (or at least the results are highly suspicious), so always gut-check what you see.

Putting all of the data to work

If you approach this process in an organized way (if I can do it, you can do it, because, frankly, I’m not that organized), what you should end up with is a list of relevant topics you might have missed, a list of your currently top-performing pages, a list of your relevant competitors, and a list of your competitors’ top-performing pages. With this bundle of related data, you can answer questions like the following:

  • Are you at risk of competing with your own relevant content?

  • Should you create new content or improve on existing content?

  • Is there outdated content you should remove or 301-redirect?

  • What competitors are most relevant in this content space?

  • What effort/cost will it take to clear the competitive bar?

  • What niches haven’t been covered by your competitors?

No tool will magically answer these questions, but by using your existing keyword research tools and Google’s advanced search operators methodically, you should be able to put your human intelligence to work and create a specific and actionable content strategy around your chosen topic.

If you’d like to learn more about Google’s advanced search operators, check out our comprehensive Learning Center page or my post with 67 search operator tricks. I’d love to hear more about how you put these tools to work in your own competitive research.

Life rushed back into Jayda’s lungs, sharp and unforgiving. To her left, shards of a thousand synonyms. To her right, the crumbling remains of a mountain of long-tail keywords. As the air filled her lungs, the memories came rushing back, and with them the crushing realization that her team was buried beneath the debris. After months of effort, they had finally finished their competitive keyword research, but at what cost?


Digital Marketing

Convince Your Boss to Send You to MozCon Virtual 2021 [Plus Bonus Letter Template!]

It’s time to get down to business and convince your boss that you HAVE to go to MozCon Virtual 2021.

You’re already well acquainted with the benefits of MozCon. Maybe you’re a MozCon alumnus, or you may have lurked the hashtag once or twice for inside tips. You’ve likely followed the work of some of the speakers for a while. But how are you going to relay that to your boss in a way that sells? Don’t worry, we’ve got a plan.

(And if you want to skip ahead to the letter template, here it is!)

Copy the template

Step #1 – Gather evidence

Alright, so just going in and saying “Have you seen any of Britney Muller’s Whiteboard Fridays lately?!” probably won’t do the trick — we need some cold hard facts that you can present.

MozCon delivers actionable insights

It’s easy to say that MozCon provides actionable insights, but how do you prove it? A quick scroll through our Facebook Group can prove to anyone that not only is MozCon a gathering of the greatest minds in search, but it also acts as an incubator and facilitator for SEO strategies.

If you can’t get your boss on Facebook, just direct them to the blog post written by Croud: Four things I changed immediately after attending MozCon. Talk about actionable! A quick Google (or LinkedIn) search will return dozens of similar recaps. Gather a few of these to have in your tool belt just in case.

Or, if you have the time, pick out some of the event tweets from previous years that relate most to your company. The MozCon hashtag (#MozCon) has plenty of tweets to choose from — things like research findings, workflows, and useful tools are all covered. 

The networking is unbeatable

The potential knowledge gain doesn’t end with keynote speeches. Many of our speakers stick around for the entire conference and host niche- and vertical-specific Birds of a Feather sessions. If you find yourself with questions about their strategies, you’ll often have the ability to ask them directly.

Lastly, your peers! There’s no better way to learn than from those who overcome the same obstacles as you. Opportunities for collaboration and peer-to-peer learning are often invaluable, and can lead to better workflows, new business, and even exciting partnerships.

Step #2 – Break down the costs

This is where the majority of the conversation will be focused, but fear not, Roger has already done most of the heavy lifting. So let’s cut to the chase. The goal of MozCon isn’t to make money — the goal is to break even and lift up our friends in search. Plus, since it’s a virtual conference, the price is unbeatable! If you purchase a ticket before May 31, 2021, you’ll get access to Early Bird pricing, and if you’re Moz subscribers, you get a $20 discount off General Admission! 

You’ll also have the option to save 15% if you bundle the ticket with either of Moz Academy’s SEO certifications: Technical SEO or SEO Essentials.

Top-of-the-line speakers

Every year we work with our speakers to bring cutting-edge content to the stage. You can be sure that the content you’ll be exposed to will set you up for a year of success.

Videos for everyone

While your coworkers won’t be able to enjoy the live sessions, they will be able to see all of the talks via professional video and audio. Your ticket to MozCon includes a professional video package which allows you (and your whole team) to watch every single talk post-conference, for free. 

Step #3 – Be prepared to prove value

It’s important to go into the conference with a plan to bring back value. It’s easy to come to any conference and just enjoy the presentations and events, but it’s harder to take the information gained and implement change.

Make a plan

Before approaching your boss, make sure you have a plan on how you’re going to show off all of the insights you gather at MozCon! Obviously, you’ll be taking notes — whether it’s to the tune of live tweets, bullet journals, or doodles, those notes are most valuable when they’re backed up by action.

Putting it into action

Set expectations with your boss. “After each day, I’ll select three takeaways and create a plan on how to execute them.” Who could turn down nine potential business-changing strategies?!

And it really isn’t that hard! Especially not with the content that you’ll have access to. At the close of each day, we recommend you look back over your notes and do a brain-dump. 

  • How did today’s content relate to your business? 
  • Which sessions resonated and would bring the most value to your team? 
  • Which strategies can easily be executed? 
  • Which would make the biggest impact?

After you identify those strategies, create a plan of action that will get you on track for implementing change.

Client briefs

If you have clients on retainer, ongoing training for employees is something those clients should appreciate — it ensures you’re staying ahead of the game. Offer to not only debrief your in-house SEO team, but to also present to your clients. This sort of presentation is a value add that many clients don’t get and can set your business apart.

These presentations can be short blurbs at the beginning of a regular meeting or a chance to gather up all of your clients and enjoy a bit of networking and education.

Still not enough?

Give the boss a taste of MozCon by having them check out some videos from years past to get a taste for the caliber of our speakers. 

Lastly, the reviews speak for themselves. MozCon is perfect for SEOs of any level, no matter where they’re located! 

Our fingers are crossed!

Alright, friend, now is your time to shine. We’ve equipped you with some super-persuasive tools and we’ll be crossing our fingers that the boss gives you the “okay!” Be sure to grab the letter template and make your case the easy way:

Copy the template

We hope to see your smiling face at MozCon Virtual 2021!


Digital Marketing

Page Level Query Analysis at Scale with Google Colab, Python, & the GSC API [Video Instructions Included]

The YouTube playlist referenced throughout this blog can be found here:6 Part YouTube Series [Setting Up & Using the Query Optimization Checker]

Anyone who does SEO as part of their job knows that there’s a lot of value in analyzing which queries are and are not sending traffic to specific pages on a site.

The most common uses for these datasets are to align on-page optimizations with existing rankings and traffic, and to identify gaps in ranking keywords.

However, working with this data is extremely tedious because it’s only available in the Google Search Console interface, and you have to look at only one page at a time.

On top of that, to get information on the text included in the ranking page, you either need to manually review it or extract it with a tool like Screaming Frog.

You need this kind of view:

Example pivot table for traffic data.

…but even the above view would only be viable one page at a time, and as mentioned, the actual text extraction would have had to be separate as well.

Given these apparent issues with the readily available data at the SEO community’s disposal, the data engineering team at Inseev Interactive has been spending a lot of time thinking about how we can improve these processes at scale.

One specific example that we’ll be reviewing in this post is a simple script that allows you to get the above data in a flexible format for many great analytical views.

Better yet, this will all be available with only a few single input variables.

A quick rundown of tool functionality

The tool automatically compares the text on-page to the Google Search Console top queries at the page-level to let you know which queries are on-page as well as how many times they appear on the page. An optional XPath variable also allows you to specify the part of the page you want to analyze text on.

This means you’ll know exactly what queries are driving clicks/impressions that are not in your <title>, <h1>, or even something as specific as the first paragraph within the main content (MC). The sky’s the limit.

For those of you not familiar, we’ve also provided some quick XPath expressions you can use, as well as how to create site-specific XPath expressions within the “Input Variables” section of the post.

Post setup usage & datasets

Once the process is set up, all that’s required is filling out a short list of variables and the rest is automated for you.

The output dataset includes multiple automated CSV datasets, as well as a structured file format to keep things organized. A simple pivot of the core analysis automated CSV can provide you with the below dataset and many other useful layouts.

A simple pivot table of the core analysis automated CSV.

… Even some “new metrics”?

Okay, not technically “new,” but if you exclusively use the Google Search Console user interface, then you haven’t likely had access to metrics like these before: “Max Position,” “Min Position,” and “Count Position” for the specified date range – all of which are explained in the “Running your first analysis” section of the post.

Example pivot table with

To really demonstrate the impact and usefulness of this dataset, in the video below we use the Colab tool to:

  1. [3 Minutes] — Find non-brand <title> optimization opportunities for (around 30 pages in video, but you could do any number of pages)

  2. [3 Minutes] — Convert the CSV to a more useable format

  3. [1 Minute] – Optimize the first title with the resulting dataset

Okay, you’re all set for the initial rundown. Hopefully we were able to get you excited before moving into the somewhat dull setup process.

Keep in mind that at the end of the post, there is also a section including a few helpful use cases and an example template! To jump directly to each section of this post, please use the following links: 

[Quick Consideration #1] — The web scraper built into the tool DOES NOT support JavaScript rendering. If your website uses client-side rendering, the full functionality of the tool unfortunately will not work.

[Quick Consideration #2] — This tool has been heavily tested by the members of the Inseev team. Most bugs [specifically with the web scraper] have been found and fixed, but like any other program, it is possible that other issues may come up.

  • If you encounter any errors, feel free to reach out to us directly at or, and either myself or one of the other members of the data engineering team at Inseev would be happy to help you out.

  • If new errors are encountered and fixed, we will always upload the updated script to the code repository linked in the sections below so the most up-to-date code can be utilized by all!

One-time setup of the script in Google Colab (in less than 20 minutes)

Things you’ll need:

  1. Google Drive

  2. Google Cloud Platform account

  3. Google Search Console access

Video walkthrough: tool setup process

Below you’ll find step-by-step editorial instructions in order to set up the entire process. However, if following editorial instructions isn’t your preferred method, we recorded a video of the setup process as well.

As you’ll see, we start with a brand new Gmail and set up the entire process in approximately 12 minutes, and the output is completely worth the time.

Keep in mind that the setup is one-off, and once set up, the tool should work on command from there on!

Editorial walkthrough: tool setup process

Four-part process:

  1. Download the files from Github and set up in Google Drive

  2. Set up a Google Cloud Platform (GCP) Project (skip if you already have an account)

  3. Create the OAuth 2.0 client ID for the Google Search Console (GSC) API (skip if you already have an OAuth client ID with the Search Console API enabled)

  4. Add the OAuth 2.0 credentials to the file

Part one: Download the files from Github and set up in Google Drive

Download source files (no code required)

1. Navigate here.

2. Select “Code” > “Download Zip”

*You can also use ‘git clone if you’re more comfortable using the command prompt.

Select Code then Download Zip
Initiate Google Colab in Google Drive

If you already have a Google Colaboratory setup in your Google Drive, feel free to skip this step.

1. Navigate here.

2. Click “New” > “More” > “Connect more apps”.

Click New then More then Connect more apps

3. Search “Colaboratory” > Click into the application page.

Search for Colaboratory and Click into the application page

4. Click “Install” > “Continue” > Sign in with OAuth.

Click Install then Continue then Sign in with OAuth

5. Click “OK” with the prompt checked so Google Drive automatically sets appropriate files to open with Google Colab (optional).

Import the downloaded folder to Google Drive & open in Colab

1. Navigate to Google Drive and create a folder called “Colab Notebooks”.

IMPORTANT: The folder needs to be called “Colab Notebooks” as the script is configured to look for the “api” folder from within “Colab Notebooks”.

Error resulting in improper folder naming.
Error resulting in improper folder naming.

2. Import the folder downloaded from Github into Google Drive.

At the end of this step, you should have a folder in your Google Drive that contains the below items:

The folder should contain the query optimization checker and the README.MD

Part two: Set up a Google Cloud Platform (GCP) project

If you already have a Google Cloud Platform (GCP) account, feel free to skip this part.

1. Navigate to the Google Cloud page.

2. Click on the “Get started for free” CTA (CTA text may change over time).

Click Get Started For Free

3. Sign in with the OAuth credentials of your choice. Any Gmail email will work.

4. Follow the prompts to sign up for your GCP account.

You’ll be asked to supply a credit card to sign up, but there is currently a $300 free trial and Google notes that they won’t charge you until you upgrade your account.

Part three: Create a 0Auth 2.0 client ID for the Google Search Console (GSC) API

1. Navigate here.

2. After you log in to your desired Google Cloud account, click “ENABLE”.

Click Enable in GSC API

3. Configure the consent screen.

  • In the consent screen creation process, select “External,” then continue onto the “App Information.”

Example below of minimum requirements:

App information window for the consent screen.
Developer contact information section of consent screen.
  • Skip “Scopes”
  • Add the email(s) you’ll use for the Search Console API authentication into the “Test Users”. There could be other emails versus just the one that owns the Google Drive. An example may be a client’s email where you access the Google Search Console UI to view their KPIs.
Add the emails you’ll use for the Search Console API authentication into the Test Users

4. In the left-rail navigation, click into “Credentials” > “CREATE CREDENTIALS” > “OAuth Client ID” (Not in image).

In the left-rail navigation, click into Credentials then CREATE CREDENTIALS then OAuth Client ID

5. Within the “Create OAuth client ID” form, fill in:

  • Application Type = Desktop app

  • Name = Google Colab

  • Click “CREATE”

    Within the Create OAuth client ID form, fill in Application Type as Desktop app, Name as Google Colab, then Click CREATE

    6. Save the “Client ID” and “Client Secret” — as these will be added into the “api” folder file from the Github files we downloaded.

    • These should have appeared in a popup after hitting “CREATE”

    • The “Client Secret” is functionally the password to your Google Cloud (DO NOT post this to the public/share it online)

    Part four: Add the OAuth 2.0 credentials to the file

    1. Return to Google Drive and navigate into the “api” folder.

    2. Click into

    Click into

    3. Choose to open with “Text Editor” (or another app of your choice) to modify the file.

    Choose to open with Text Editor to modify the file

    4. Update the three areas highlighted below with your:

    • CLIENT_ID: From the OAuth 2.0 client ID setup process

    • CLIENT_SECRET: From the OAuth 2.0 client ID setup process

    • GOOGLE_CREDENTIALS: Email that corresponds with your CLIENT_ID & CLIENT_SECRET

    Update the CLIENT_ID From the OAuth 2.0 client ID setup process, the CLIENT_SECRET From the OAuth 2.0 client ID setup process, and GOOGLE_CREDENTIALS Email that corresponds with your CLIENT_ID and CLIENT_SECRET

    5. Save the file once updated!

    Congratulations, the boring stuff is over. You are now ready to start using the Google Colab file!

    Running your first analysis

    Running your first analysis may be a little intimidating, but stick with it and it will get easy fast.

    Below, we’ve provided details regarding the input variables required, as well as notes on things to keep in mind when running the script and analyzing the resulting dataset.

    After we walk through these items, there are also a few example projects and video walkthroughs showcasing ways to utilize these datasets for client deliverables.

    Setting up the input variables

    XPath extraction with the “xpath_selector” variable

    Have you ever wanted to know every query driving clicks and impressions to a webpage that aren’t in your <title> or <h1> tag? Well, this parameter will allow you to do just that.

    While optional, using this is highly encouraged and we feel it “supercharges” the analysis. Simply define site sections with Xpaths and the script will do the rest.

    In the above video, you’ll find examples on how to create site specific extractions. In addition, below are some universal extractions that should work on almost any site on the web:

    • ‘//title’ # Identifies a <title> tag

    • ‘//h1’ # Identifies a <h1> tag

    • ‘//h2’ # Identifies a <h2> tag

    Site Specific: How to scrape only the main content (MC)?

    Chaining Xpaths – Add a “|” Between Xpaths

    • ‘//title | //h1’ # Gets you both the <title> and <h1> tag in 1 run

    • ‘//h1 | //h2 | //h3’ # Gets you both the <h1>, <h2> and <h3> tags in 1 run

    Other variables

    Here’s a video overview of the other variables with a short description of each.

    ‘colab_path’ [Required] – The path in which the Colab file lives. This should be “/content/drive/My Drive/Colab Notebooks/”.

    ‘domain_lookup’ [Required] – Homepage of the website utilized for analysis.

    ‘startdate’ & ‘enddate’ [Required] – Date range for the analysis period.

    ‘gsc_sorting_field’ [Required] – The tool pulls the top N pages as defined by the user. The “top” is defined by either “clicks_sum” or “impressions_sum.” Please review the video for a more detailed description.

    ‘gsc_limit_pages_number’ [Required] – Numeric value that represents the number of resulting pages you’d like within the dataset.

    ‘brand_exclusions’ [Optional] – The string sequence(s) that commonly result in branded queries (e.g., anything containing “inseev” will be branded queries for “Inseev Interactive”).

    ‘impressions_exclusion’ [Optional] – Numeric value used to exclude queries that are potentially irrelevant due to the lack of pre-existing impressions. This is primarily relevant for domains with strong pre-existing rankings on a large scale number of pages.

    ‘page_inclusions’ [Optional] – The string sequence(s) that are found within the desired analysis page type. If you’d like to analyze the entire domain, leave this section blank.

    Running the script

    Keep in mind that once the script finishes running, you’re generally going to use the “step3_query-optimizer_domain-YYYY-MM-DD.csv” file for analysis, but there are others with the raw datasets to browse as well.

    Practical use cases for the “step3_query-optimizer_domain-YYYY-MM-DD.csv” file can be found in the “Practical use cases and templates” section.

    That said, there are a few important things to note while testing things out:

    1. No JavaScript Crawling: As mentioned at the start of the post, this script is NOT set up for JavaScript crawling, so if your target website uses a JS frontend with client-side rendering to populate the main content (MC), the scrape will not be useful. However, the basic functionality of quickly getting the top XX (user-defined) queries and pages can still be useful by itself.

    2. Google Drive / GSC API Auth: The first time you run the script in each new session it will prompt you to authenticate both the Google Drive and the Google Search Console credentials.

    • Google Drive authentication: Authenticate to whatever email is associated with the Google Drive with the script.

    • GSC authentication: Authenticate whichever email has permission to use the desired Google Search Console account.
      • If you attempt to authenticate and you get an error that looks like the one below, please revisit the “Add the email(s) you’ll use the Colab app with into the ‘Test Users'” from Part 3, step 3 in the process above: setting up the consent screen.

    If you attempt to authenticate and you get an error, please revisit the Add the emails you’ll use the Colab app with into the Test Users step from setting up the consent screen.

    Quick tip: The Google Drive account and the GSC Authentication DO NOT have to be the same email, but they do require separate authentications with OAuth.

    3. Running the script: Either navigate to “Runtime” > “Restart and Run All” or use the keyboard shortcut CTRL + fn9 to start running the script.

    4. Populated datasets/folder structure: There are three CSVs populated by the script – all nested within a folder structure based on the “domain_lookup” input variable.

    There are 3 CSVs populated by the script, all nested within a folder structure based on the domain_lookup input variable.
    • Automated Organization [Folders]: Each time you rerun the script on a new domain, it will create a new folder structure in order to keep things organized.

    • Automated Organization [File Naming]: The CSVs include the date of the export appended to the end, so you’ll always know when the process ran as well as the date range for the dataset.

    5. Date range for dataset: Inside of the dataset there is a “gsc_datasetID” column generated, which includes the date range of the extraction.

    Inside of the dataset there is a gsc_datasetID column generated which includes the date range of the extraction.

    6. Unfamiliar metrics: The resulting dataset has all the KPIs we know and love – e.g. clicks, impressions, average (mean) position — but there are also a few you cannot get directly from the GSC UI:

    • ‘count_instances_gsc’ — the number of instances the query got at least 1 impression during the specified date range. Scenario example: GSC tells you that you were in an average position 6 for a large keyword like “flower delivery” and you only received 20 impressions in a 30-day date range. Doesn’t seem possible that you were really in position 6, right? Well, now you can see that was potentially because you only actually showed up on one day in that 30-day date range (e.g. count_instances_gsc = 1)

    • ‘max_position’ & ‘min_position’ — the MAXIMUM and MINIMUM ranking position the identified page showed up for in Google Search within the specified date range.

    Quick tip #1: Large variance in max/min may tell you that your keyword has been fluctuating heavily.

    Quick tip #2: These KPIs, in conjunction with the “count_instances_gsc”, can exponentially further your understanding of query performance and opportunity.

    Practical use cases and templates

    Access the recommended multi-use template.

    Recommended use: Download file and use with Excel. Subjectively speaking, I believe Excel has a much more user friendly pivot table functionality in comparison to Google Sheets — which is critical for using this template.

    Alternative use: If you do not have Microsoft Excel or you prefer a different tool, you can use most spreadsheet apps that contain pivot functionality.

    For those who opt for an alternative spreadsheet software/app:

    1. Below are the pivot fields to mimic upon setup.

    2. You may have to adjust the Vlookup functions found on the “Step 3 _ Analysis Final Doc” tab, depending on whether your updated pivot columns align with the current pivot I’ve supplied.

    Pivot fields to mimic upon setup.

    Project example: Title & H1 re-optimizations (video walkthrough)

    Project description: Locate keywords that are driving clicks and impressions to high value pages and that do not exist within the <title> and <h1> tags by reviewing GSC query KPIs vs. current page elements. Use the resulting findings to re-optimize both the <title> and <h1> tags for pre-existing pages.

    Project assumptions: This process assumes that inserting keywords into both the <title> and <h1> tags is a strong SEO practice for relevancy optimization, and that it’s important to include related keyword variants into these areas (e.g. non-exact match keywords with matching SERP intent).

    Project example: On-page text refresh/re-optimization

    Project description: Locate keywords that are driving clicks and impressions to editorial pieces of content that DO NOT exist within the first paragraph within the body of the main content (MC). Perform an on-page refresh of introductory content within editorial pages to include high value keyword opportunities.

    Project assumptions: This process assumes that inserting keywords into the first several sentences of a piece of content is a strong SEO practice for relevancy optimization, and that it’s important to include related keyword variants into these areas (e.g. non-exact match keywords with matching SERP intent).

    Final thoughts

    We hope this post has been helpful and opened you up to the idea of using Python and Google Colab to supercharge your relevancy optimization strategy.

    As mentioned throughout the post, keep the following in mind:

    1. Github repository will be updated with any changes we make in the future.

    2. There is the possibility of undiscovered errors. If these occur, Inseev is happy to help! In fact, we would actually appreciate you reaching out to investigate and fix errors (if any do appear). This way others don’t run into the same problems.

    Other than the above, if you have any ideas on ways to Colab (pun intended) on data analytics projects, feel free to reach out with ideas.


    Digital Marketing

    Local Justifications Are a Big Deal and You Can Influence Them


    Digital Marketing

    How to Use Surveys to Tap into Trending Conversations (and Build Links)


    Digital Marketing

    When & How to Disavow Backlinks in 2021


    Digital Marketing

    6 Steps to Executing an Efficient SEO Clean-Up Strategy


    Digital Marketing

    Technical SEO Implementations to Increase the Impact of Your Link Building Campaigns


    Digital Marketing

    Long Tail SEO in 2021: How You Can Have It All or Die Trying