Screaming Frog Thu, 03 Oct 2019 10:17:22 +0000 en-US hourly 1 The Beginner’s Guide to SEO Competitor Analysis Thu, 03 Oct 2019 07:45:57 +0000 Unless you’re lucky enough to operate in a monopoly, competition will be an everyday part of your business. Online is no different. Knowing who you’re fighting for visibility in SERPs is the first step to building a watertight SEO strategy. This post will explore how to find your competitors and...

The post The Beginner’s Guide to SEO Competitor Analysis appeared first on Screaming Frog.

Unless you’re lucky enough to operate in a monopoly, competition will be an everyday part of your business. Online is no different. Knowing who you’re fighting for visibility in SERPs is the first step to building a watertight SEO strategy.

This post will explore how to find your competitors and analyse their backlink profiles. If all goes to plan, you’ll find weaknesses you can exploit and strengths you can use to inspire future campaigns.

To illustrate the process, we’ll be using an imaginary chemistry facts website as our ‘new’ site in need of a strategy. This site wants to rank for keywords including [chemistry facts], [chemistry revision] and [learn chemistry].

Finding the Competition

The first step in any competitor analysis is finding competitors to analyse. To do this the easiest way is to run a Google Search for your target keywords.

SERP for chemistry facts

From these searches, we can identify our main SEO rivals. While these competitors may not necessarily offer the exact same things as our site, they all compete for the same key search phrases.

This is an important point: SEO competitors may not overlap with direct business competitors. Being aware of this will save you a lot of headaches further down the line.

Once you have your list of competitors, I recommend narrowing down to ten or fewer. This keeps the analysis manageable while still being in-depth enough to provide insight.

Competitor Visibility

Now you need to see which of these competitors are within reach and which are dominating currently. Splitting your competitors into two tiers can be useful; realistic ones you can target over the short to medium term, and ambitious competitors that’ll take sustained investment and effort to overtake.

The way to assess this is by looking at your competitors’ visibility in SERPs. SISTRIX is one tool that measures this metric, and the way it is calculated is as follows.

First, SISTRIX takes a sample of the top 100 search positions for one million keywords or keyword combinations. (As a comparison, the Oxford English Dictionary contains about 120,000 words). It then weights the results according to position and search volume for a keyword. (See here for more detail about the visibility index).

SISTRIX visibility fpr ZMEScience

Enter your first competitor into the SISTRIX toolbar (we’ve chosen ZMEScience) and scroll to their visibility index. Next, click on the cog in the top right corner and then ‘Compare Data in Chart’. This allows a comparison of up to 4 websites’ visibilities. Enter the rest of your first batch of competitors and hit ‘Compare’.

You may have one visibility that’s so large that it doesn’t let you see the others clearly. If that’s the case, set it aside for now and replace with another competitor until you have a graph that’s readable. These will be our realistic competitors. The ones you’ve set aside will be your ambitious, longer-term competition.

SISTRIX competitor visibility comparison
Clicking the cog again and selecting ‘Show More Pins’ allows SISTRIX to show the dates of known Google algorithm updates. It’s interesting to note if any competitors have surges or drops in visibility that coincide with these dates.

From the above graph we can see that RevisionWorld (in blue) surged after the second Medic Update (pin M). Conversely, ZMEScience (red) has dropped dramatically after the June 2019 Core Update (pin O).

You can use this to inform your strategy; is there anything surging competitors are doing that you aren’t? Or is there something you’re getting away with, but another competitor has been hit for?

You can also use other tools to measure your visibility. Searchmetrics has a nice feature that allows you to see how many keywords you share with your competitors. As before, we’ve chosen ZMEScience to be our representative example.

Competitors for ZMEScience on Searchmetrics

From this we can see that ZMEScience shares a lot of its keywords with a lot of high-authority sites such as the Encyclopedia Britannica and National Geographic. These would obviously be considered long-term competitors that we wouldn’t be able to target immediately.

Finally, SEMrush also shows something similar. Its Competitive Positioning Map shows competitors by organic traffic and the number of keywords they are ranking in the top 20 Google results for. The size of the bubble represents a website’s visibility in SERPs.

ZMEScience competitors SEMrush

SERP Analysis

If there is one particular keyword that you are targeting, it can be worth analysing the SERP for this keyword. For example, what is the type of content ranking for this query?

The results that Google shows can give you insight into what it thinks the intent behind that search is. If the results are all guides, blog posts and listicles, then it is fair to assume that the intent is informational. People are looking for information in this instance, so to rank for this query you’ll have to provide that information.

Looking at the SERP for [chemistry facts], this is exactly what we get. All ten organic results are information pages, which shouldn’t really be surprising. People aren’t generally looking to buy facts. (But if you know someone who is, send them my way. I’ve got some good ones).

Moz’s SERP Analysis section contains useful metrics such as overall Keyword Difficulty, as well as individual Domain Authority and Page Authority scores for each result. Keyword Difficulty estimates how easy it is to rank above the current competitors for that query; the lower the score, the better.

The SERP analysis results can also be used to get an idea of what you might need to achieve to compete.

Using minimum and average metrics for the top ten results can be one way to do this. In this Google Sheet, I’ve created a SERP analysis template (in the tab imaginatively named SERP Analysis). You will need to make a copy before you can do anything.

Fill this in with the Domain and Page Authority for each result, as well as the number of Referring Domains to both the page and overall domain. You should see something like the following:

competitor analysis template

This gives the minimum and average for each of the metrics mentioned above. As we should anticipate, both the number and quality of referring domains is important in order to rank well.

These numbers should be taken with a heavy pinch of salt; we will not need 62,000 referring domains just to compete. In this case the very high number to ThoughtCo skews the averages high.

Nevertheless, it remains useful to see all the numbers in one place to give an overview of where your competitors are.

Backlink Profile Analysis

Now you can take a deeper dive into the backlink profile of your immediate competition. Using a mixture of metrics from three well-known SEO tools (Moz, Ahrefs and Majestic) allows a more detailed comparison than using any one alone.

When doing comparative work like this, it’s important to make it as efficient as possible. All three SEO tools have a comparison part where you can submit multiple URLs rather than doing it one-by-one.

Moz has its Compare Link Profile section under Link Research, Ahrefs has a Batch Analysis tool, and Majestic has a Comparator section. With Ahrefs, make sure you use the ‘Live’ index to make sure the data is as up to date as possible.

Note that you can’t directly compare numbers from different sources as they are calculated differently. It’s also worth noting that you should judge numbers relative to your site rather than in absolute terms.

Preparing a table like the below allows an overall look at each competitor’s backlink profile. It also makes it easy to note any outliers that you need to investigate further.

Backlink comparison table

Out of the metrics, Referring Domains and Domain Authority (DA) are particularly important. We often see a large correlation between these and SEO performance.

Referring Domains is the number of separate websites linking to a given site, while DA (a score out of 100) is an estimation of the quality of these links.

From this, we can see that ZMEScience has by far and away the highest quality link profile. It has the highest number of Referring Domains and the highest DA. Therefore, it’s interesting to see from the visibility graph that it’s not as visible as RevisionWorld.

It is also worth noting that RevisionScience and RevisionWorld have nearly the same number of backlinks. These come from 200 and 1,500 referring domains respectively.

This implies that a large proportion of the backlinks to RevisionScience may be low-quality links, potentially due to mass submission or scraper sites. This should be a competitor our site should look to be challenging, but not replicating in terms of linking.

Link Quality

You can also compare competitors by link quality. In theory, links from domains with a higher DA (Moz) or Domain Rating (Ahrefs) score should pass more authority to the linked site.

Moz shows this by default, segmenting DA into batches of 10: 0-10, 11-20 and so on. You can see this in Moz’s Link Explorer. Simply input your competitor’s domain and hit enter. Scroll down and the bottom-right chart should look like the below (for ZMEScience).

Segmenting referring domains by DA

If you want to use Ahrefs to segment referring domains, it gets a little more involved. However, this has the advantage of being able to compare competitors side-by-side.

We segment the Domain Rating scores as follows:

  • 100 to 70 – Most Valuable
  • 69 to 50 – Valuable
  • 49 to 30 – Average
  • 29 to 0 – Low Value

To do this go back to your Ahrefs Batch Analysis and click on the Total number of Referring Domains for the top competitor. This will bring up the Referring Domains report for this website.

ahrefs batch analysis

Export this to a CSV file, then filter Column C by the Domain Rating segments shown above. (Filter dropdown > Number Filters > Between…)

Referring domains CSV export

Make a copy of the Google Sheet template found here. Paste the number of overall Referring Domains for each segment into the template (Columns H onwards). Repeat this for all segments and all competitors until the chart is fully populated. For our science competitors, we see the following.

Link analysis graph from template
This visualisation allows a quick comparison of how many of each site’s referring domains falls under the segments described above. The proportions are also represented as a table in the template.

Link analysis proportions from template

In this case, the table is clearer due to the comparatively high number of referring domains to ZMEScience. We can see the vast majority of the other sites’ referring domains are the lowest quality. This would suggest by targeting high-quality sites with our content, our new website would have an advantage.

Top Pages

Looking at a site’s most-linked to pages is a good way of understanding what linking work it’s been up to. If you can find out what works for your competitors, you can try something similar for yourself.

You can use the Best by Links report from Ahrefs to investigate this. (Enter domain or blog subfolder/subdomain > Pages > Best by links).

When looking at RevisionWorld, we see that one of its top pages is a revision calendar creator. This has 27 referring domains and over 1,700 dofollow backlinks.

Ahrefs best by links report for RevisionWorld

Therefore, our new site could look at creating something similar, but even better. We could then target those sites that link to the now inferior content and ask them to link back to our new piece.

To find these sites to target, simply click on the number of referring domains in the Top Pages report.

Ahrefs referring domains report for revision calendar creator

Link Growth

Finally, you can study competitors’ backlink growth. The rate at which they’re acquiring referring domains gives you a rough target to aim for with your site’s link building efforts.

Ahrefs’ Domain Comparison shows this in a visual way. Enter the URLs of your competitors into the boxes and hit ‘Compare’.

Competitor link growth chart Link growth chart legend

This shows at what rate competitors’ backlink profiles have been growing or declining.

Consistent growth, as seen for ZMEScience, can be natural or the result of long-term link building work.

RevisionWorld has experienced more inconsistent growth over the last five years. (I have removed ZMEScience for clarity – click its name in the legend to achieve this).

Link growth chart excluding ZMEScience Link growth chart legend excluding ZMEScience

From this we can see rapid growth between October 2015 and March 2016.

By looking at Ahrefs’ New Referring Domains report (Enter domain > Referring domains > New), you can work out what might have caused this.

If most of the links point to the same page, then it’s likely down to a piece of content going viral. But if most of the added domains look spammy, it it’s more likely to be poor quality link building. It could also be due to low quality syndication sites which are usually present (to an extent) in most sites’ backlink profiles.

In RevisionWorld’s case, a lot of the new links with high Domain Ratings are, and .edu domains. These link back to revision guides on the RevisionWorld site.

This suggests they’ve had success outreaching their revision guides as an educational resource to schools and universities. This could be something our new site could look to replicate.

As seen earlier, this could be one reason why RevisionWorld is currently more visible in SERPs than ZMEScience. This is despite RevisionWorld having only a tenth of the number of referring domains.


The analysis above will help you find your SEO competitors and how to replicate their successes and learn from their mistakes.

As your site changes and grows, so will your competitors. You’ll need to keep tabs on who you’re fighting for SERP space with. Hopefully one day you’ll be challenging those ambitious competitors you identified way back at the start.

The post The Beginner’s Guide to SEO Competitor Analysis appeared first on Screaming Frog.

]]> 0
The brightonSEO Crawling Clinic Mon, 09 Sep 2019 12:52:00 +0000 For the first time last year we ran a crawling clinic at the legendary brightonSEO. The team had a lot of fun meeting everyone and chatting about crawling and technical SEO while just a little hungover from the pre-party, so we decided to do it again this year. The idea...

The post The brightonSEO Crawling Clinic appeared first on Screaming Frog.

For the first time last year we ran a crawling clinic at the legendary brightonSEO. The team had a lot of fun meeting everyone and chatting about crawling and technical SEO while just a little hungover from the pre-party, so we decided to do it again this year.

The idea of the crawling clinic is that you’re able to meet the Screaming Frog team and chat about any crawling issues you’re experiencing, how best to tackle them, and any feature requests you’d like to see for the software – or just pilfer some swag.

We’re also running our SEO Spider training course at brightonSEO on the Thursday (12th September). This is the same SEO Spider training course that we offer. It is aimed at those who are familiar with the basic uses of the SEO Spider, but want to learn how to make more of the tool.

We’re looking forward to meeting everyone attending the course, and if you’d like to join the workshop there’s still a few places left.

Version 12.0 Sneak Preview

The team will also be running the new beta 12 version of the Screaming Frog SEO Spider at the Crawling Clinic. So if you’d like a sneak peek of some very cool new features that are coming soon before everyone else, then come on over to the clinic. We’ll be on the main exhibition floor (B7) on Friday (13th September) throughout the day.

If you’re attending the conference, then also make sure you pick up the latest edition of the Screaming Frog brightonSEO beer mats!

Come & Chat About Crawling

You don’t need to book anything, you can just come over and chat to us at our crawling clinic stand. We’ll be on hand to help with any crawling issues and will have a few machines to run through anything. So if you’d like to meet our team and chat about crawling, log files, SEO in general then, then please do come over and see us.

Alternatively, you can say hello in the bar before at the pre or after parties! See you all on Thursday and Friday.

The post The brightonSEO Crawling Clinic appeared first on Screaming Frog.

]]> 0
How to Scrape Google Search Features Using XPath Tue, 03 Sep 2019 08:36:39 +0000 Google’s search engine results pages (SERPs) have changed a great deal over the last 10 years, with more and more data and information being pulled directly into the results pages themselves. Google search features are a regular occurence on most SERPs nowadays, some of most common features being featured snippets...

The post How to Scrape Google Search Features Using XPath appeared first on Screaming Frog.

Google’s search engine results pages (SERPs) have changed a great deal over the last 10 years, with more and more data and information being pulled directly into the results pages themselves. Google search features are a regular occurence on most SERPs nowadays, some of most common features being featured snippets (aka ‘position zero’), knowledge panels and related questions (aka ‘people also ask’). Data suggests that some features such as related questions may feature on nearly 90% of SERPs today – a huge increase over the last few years.

Understanding these features can be powerful for SEO. Reverse engineering why certain features appear for particular query types and analyisng the data or text included in said features can help inform us in making optimisation decisions. With organic CTR seemingly on the decline, optimising for Google search features is more important than ever, to ensure content is as visible as it possibly can be to search users.

This guide runs through the process of gathering search feature data from the SERPs, to help scale your analysis and optimisation efforts. I’ll demonstrate how to scrape data from the SERPs using the Screaming Frog SEO Spider using XPath, and show just how easy it is to grab a load of relevant and useful data very quickly. This guide focuses on featured snippets and related questions specifically, but the principles remain the same for scraping other features too.


If you’re already an XPath and scraping expert and are just here for the syntax and data type to setup your extraction (perhaps you saw me eloquently explain the process at SEOCamp Paris or Pubcon Las Vegas this year!), here you go (spoiler alert for everyone else!) –

Featured snippet XPath syntax

  • Featured snippet page title (Text) – (//div[@class='ellip'])[1]/text()
  • Featured snippet text paragraph (Text) – (//span[@class="e24Kjd"])[1]
  • Featured snippet bullet point text (Text) – //ul[@class="i8Z77e"]/li
  • Featured snippet numbered list (Text) – //ol[@class="X5LH0c"]/li
  • Featured snippet table (Text) – //table//tr
  • Featured snippet URL (Inner HTML) – (//div[@class="xpdopen"]//a/@href)[2]
  • Featured snippet image source (Text) – //div[@class="rg_ilbg"]
  • Related questions XPath syntax

  • Related question 1 text (Text) – (//div[1]/g-accordion-expander/div/div)[1]
  • Related question 2 text (Text) – (//div[2]/g-accordion-expander/div/div)[1]
  • Related question 3 text (Text) – (//div[3]/g-accordion-expander/div/div)[1]
  • Related question 4 text (Text) – (//div[4]/g-accordion-expander/div/div)[1]
  • Related question snippet text for all 4 questions (Text) – //g-accordion-expander//span[@class="e24Kjd"]
  • Related question page titles for all 4 questions (Text) – //g-accordion-expander//div[@class="ellip"]
  • Related question page URLs for all 4 questions (Inner HTML) – //div[@class="feCgPc y yf"]//div[@class="rc"]//a/@href
  • You can also get this list in our accompanying Google doc. Back to our regularly scheduled programming for the rest of you…follow these steps to start scraping featured snippets and related questions!

    1) Preparation

    To get started, you’ll need to download and install the SEO Spider software and have a licence to access the custom extraction feature necessary for scraping. I’d also recommend our web scraping and data extraction guide as a useful bit of light reading, just to cover the basics of what we’re getting up to here.

    2) Gather keyword data

    Next you’ll need to find relevant keywords where featured snippets and / or related questions are showing in the SERPs. Most well-known SEO intelligence tools have functionality to filter keywords you rank for (or want to rank for) and where these features show, or you might have your own rank monitoring systems to help. Failing that, simply run a few searches of important and relevant keywords to look for yourself, or grab query data from Google Search Console. Wherever you get your keyword data from, if you have a lot of data and are looking to prune and prioritise your keywords, I’d advise the following –

  • Prioritise keywords where you have a decent ranking position already. Not only is this relevant to winning a featured snippet (almost all featured snippets are taken from pages ranking organically in the top 10 positions, usually top 5), but more generally if Google thinks your page is already relevant to the query, you’ll have a better chance of targeting all types of search features.
  • Certainly consider search volume (the higher the better, right?), but also try and determine the likelihood of a search feature driving clicks too. As with keyword intent in the main organic results, not all search features will drive a significant amount of additional traffic, even if you achieve ‘position zero’. Try to consider objectively the intent behind a particular query, and prioritise keywords which are more likely to drive additional clicks.
  • 3) Create a Google search query URL

    We’re going to be crawling Google search query URLs, so need to feed the SEO Spider a URL to crawl using the keyword data gathered. This can either be done in Excel using find and replace and the ‘CONCATENATE’ formula to change the list of keywords into a single URL string (replace word spaces with + symbol, select your Google of choice, then CONCATENATE the cells to create an unbroken string), or, you can simply paste your original list of keywords into this handy Google doc with formula included (please make a copy of the doc first).

    google search query string URL

    At the end of the process you should have a list of Google search query URLs which look something like this – etc.

    4) Configure the SEO Spider

    Experienced SEO Spider users will know that our tool has a multitude of configuration options to help you gather the important data you need. Crawling Google search query URLs requires a few configurations to work. Within the menu you need to configure as follows –

  • Configuration > Spider > Rendering > JavaScript
  • Configuration > robots.txt > Settings > Ignore robots.txt
  • Configuration > User-Agent > Present User Agents > Chrome
  • Configuration > Speed > Max Threads = 1 > Max URI/s = 0.5
  • These config options ensure that the SEO Spider can access the features and also not trigger a captcha by crawling too fast. Once you’ve setup this config I’d recommend saving it as a custom configuration which you can load up again in future.

    5) Setup your extraction

    Next you need to tell the SEO spider what to extract. For this, go into the ‘Configuration’ menu and select ‘Custom’ and ‘Extraction’ –

    screaming frog seo spider custom extraction

    You should then see a screen like this –

    screaming frog seo spider xpath

    From the ‘Inactive’ drop down menu you need to select ‘XPath’. From the new dropdown which appears on the right hand side, you need to select the type of data you’re looking to extract. This will depend on what data you’re looking to extract from the search results (full list of XPath syntax and data types listed below), so let’s use the example of related questions –

    scraping google related questions

    The above screenshot shows the related questions showing for the search query ‘seo’ in the UK. Let’s say we wanted to know what related questions were showing for the query, to ensure we had content and a page which targeted and answered these questions. If Google thinks they are relevant to the original query, at the very least we should consider that for analysis and potentially for optimisation. In this example we simply want the text of the questions themselves, to help inform us from a content perpective.

    Typically 4 related questions show for a particular query, and these 4 questions have a separate XPath syntax –

  • Question 1 – (//div[1]/g-accordion-expander/div/div)[1]
  • Question 2 – (//div[2]/g-accordion-expander/div/div)[1]
  • Question 3 – (//div[3]/g-accordion-expander/div/div)[1]
  • Question 4 – (//div[4]/g-accordion-expander/div/div)[1]
  • To find the correct XPath syntax for your desired element, our web scraping guide can help, but we have a full list of the important ones at the end of this article!

    Once you’ve input your syntax, you can also rename the extraction fields to correspond to each extraction (Question 1, Question 2 etc.). For this particular extraction we want the text of the questions themselves, so need to select ‘Extract Text’ in the data type dropdown menu. You should have a screen something like this –

    screaming frog custom extraction

    If you do, you’re almost there!

    6) Crawl in list mode

    For this task you need to use the SEO Spider in List Mode. In the menu go Mode > List. Next, return to your list of created Google search query URL strings and copy all URLs. Return to the SEO Spider, hit the ‘Upload’ button and then ‘Paste’. Your list of search query URLs should appear in the window –

    screaming frog list mode

    Hit ‘OK’ and your crawl will begin.

    7) Analyse your results

    To see your extraction you need to navigate to the ‘Custom’ tab in the SEO Spider, and select the ‘Extraction’ filter. Here you should start to see your extraction rolling in. When complete, you should have a nifty looking screen like this –

    screaming frog seo spider custom extraction

    You can see your search query and the four related questions appearing in the SERPs being pulled in alongside it. When complete you can export the data and match up your keywords to your pages, and start to analyse the data and optimise to target the relevant questions.

    8) Full list of XPath syntax

    As promised, we’ve done a lot of the heavy lifting and have a list of XPath syntax to extract various featured snippet and related question elements from the SERPs –

    Featured snippet XPath syntax

  • Featured snippet page title (Text) – (//div[@class='ellip'])[1]/text()
  • Featured snippet text paragraph (Text) – (//span[@class="e24Kjd"])[1]
  • Featured snippet bullet point text (Text) – //ul[@class="i8Z77e"]/li
  • Featured snippet numbered list (Text) – //ol[@class="X5LH0c"]/li
  • Featured snippet table (Text) – //table//tr
  • Featured snippet URL (Inner HTML) – (//div[@class="xpdopen"]//a/@href)[2]
  • Featured snippet image source (Text) – //div[@class="rg_ilbg"]
  • Related questions XPath syntax

  • Related question 1 text (Text) – (//div[1]/g-accordion-expander/div/div)[1]
  • Related question 2 text (Text) – (//div[2]/g-accordion-expander/div/div)[1]
  • Related question 3 text (Text) – (//div[3]/g-accordion-expander/div/div)[1]
  • Related question 4 text (Text) – (//div[4]/g-accordion-expander/div/div)[1]
  • Related question snippet text for all 4 questions (Text) – //g-accordion-expander//span[@class="e24Kjd"]
  • Related question page titles for all 4 questions (Text) – //g-accordion-expander//div[@class="ellip"]
  • Related question page URLs for all 4 questions (Text) – //div[@class="feCgPc y yf"]//div[@class="rc"]//a/@href
  • We’ve also included them in our accompanying Google doc for ease.


    Hopefully our guide has been useful and can set you on your way to extract all sorts of useful and relevant data from the search results. Let me know how you get on, and if you have any other nifty XPath tips and tricks, please comment below!

    The post How to Scrape Google Search Features Using XPath appeared first on Screaming Frog.

    ]]> 11
    The Do’s and Don’ts of Chasing for a Link Thu, 15 Aug 2019 11:30:38 +0000 It’s happened to all of us. You bag another piece of coverage for your client’s content piece on a top-tier publication, which you’re ecstatic about. However, after a quick scroll through your elation is suddenly offset by a small twang of disappointment. There isn’t a link to your client. Getting...

    The post The Do’s and Don’ts of Chasing for a Link appeared first on Screaming Frog.

    It’s happened to all of us. You bag another piece of coverage for your client’s content piece on a top-tier publication, which you’re ecstatic about. However, after a quick scroll through your elation is suddenly offset by a small twang of disappointment. There isn’t a link to your client.

    Getting links these days is tough. Publications have enforced sitewide no-link policies, and these days journalists can be apprehensive to add them. I’m not going to delve into the reasons for why this is, or why they shouldn’t be apprehensive/strict with linking, but I will drop in this Tweet from Danny Sullivan, Google’s Public Search Liaison:

    I’m going to talk today about how best to approach journalists who have covered your content or story, but haven’t linked to your client. There are some obvious do’s or don’ts that you’d hope everyone was aware of, but unfortunately Tweets from journalists like this are a regular occurence:

    Which leads nicely onto the first point.

    How Long After an Article Going Live, Is It Appropriate to Chase Up?

    When it comes to approaching someone to add a link into a piece of coverage, the quicker you do so after the time of publishing, the more likely they are to be receptive to your request. In my opinion it would be acceptable to approach someone for a link within 1 week of an article going live, and if the topic is still somewhat relevant.

    The chances of them doing so does start to tail off quite quickly, and after the 1-week period is gone it really is best to move on and let that one go. Otherwise you risk damaging your relationship with a journalist and/or making your client look bad.

    I ran a little poll over on Twitter to hear other people’s thoughts, and the majority voted that within a few days is an acceptable time-frame to chase up a link.

    Ensure That a Link Adds Value

    Before chasing someone to add a link, ask yourself if doing so actually adds value to the piece. Is there more data to be found on your client’s site? Is it a nice looking interactive that makes it easier to view, sort and filter data? If the answer to these questions is no, you’re hindering your chances of people naturally linking to your content, and the chances of them adding a link as a result of you approaching them.

    When chasing up, make sure you include your reasons for why they should consider adding a link to the piece, as this likely to increase your chances.

    This is why it’s super important to make sure that content you make are linkable assets, where it makes sense to point users and readers to the page (via a link), and it needs to be baked into the process during the early stages of ideation.

    To give you an example, we put together this index for a client that ranks the world’s best tourist attractions. It presents the data in a visual way, allowing users to click on each tourist attraction to view a picture of it and it’s location, and they can also sort the data as they see fit. With these features in mind, people would struggle to think of a reason why they shouldn’t add a link through to the content piece.

    Ensure You’re Emailing the Right Person

    A small point: make sure you are getting in touch with the right person. The majority of the time this will be the individual who you emailed initially or who authored the article, though occasionally there may not be a name associated to the post. As well as this, the author of the article may not always be the one who makes the decision whether a link can be added or not, as it can sometimes be the responsibility of the digital editor or similar role.

    Use your best judgement and ensure you’re getting in touch with the right person, to avoid confusion and mild embarrassment.

    Agree up Front to Add a Link

    If you have the opportunity to discuss with a journalist prior to them covering the piece, for example if they respond positively to your original pitch email asking for more information, it can sometimes make sense to propose that a link is included at this point.

    Be polite and keep it simple, again highlighting the reasons why a link adds value to their article. To give a quick example:

    “If you do cover the piece, it would be great if you could add a link to it. All the data can be found on the aforementioned page, and users can filter and sort the data as they see fit. The methodology is also explained in-depth, as well as links to all the sources we used.”

    Sites That Don’t Link Out

    There are some sites that never link out, and in this instance, it may make sense to save yourself some time an effort and not chase up for links. You could try your luck, but use your judgement and previous experiences here, for example don’t chase a publication for a link that’s already told you that they don’t link out, as you could potentially harm your relationship with the site.

    Where Should the Link Point To?

    Generally speaking, if a journalist has covered your client’s content piece, the best place to link to is the subsequent page on the client’s domain. However, this may not always be the case.

    If your client has provided a comment or is quoted within an article, you may find you have more success asking them to link to a bio page of the spokesperson on your client’s site. If you don’t have a bio page for your client’s spokesperson, it’s definitely worth considering if they are regularly quoted or contribute to articles within the industry.

    On the subject of asking people to link to a specific page, it’s common for people to link to the client’s homepage instead of where the content sits on their site. Proceed with caution if you’re thinking of asking someone to change where a link points to. Generally speaking it’s best to be happy with another link in the bag, and it all helps add to a natural and diverse link profile.

    To Summarise

    To summarise the above, when chasing people to add a link to your content:

    • It should be within a few days of the article going live. If it’s outside that window, it’s best to move on otherwise you risk damaging your relationship with journalists.
    • You should ensure that including a link actually adds value to the article and its readers (more data, methodology, sources etc.)
    • Ensure you’re emailing the right person!
    • Consider proposing to a journalist that they include a link a head of the article going live, if the opportunity for discussion arises.
    • If you know a site doesn’t link out, it may make sense to take it on the chin and move on.
    • If your client is regularly quoted or contributes to industry news and articles, consider creating an bio page on their site. We’ve found that people are more receptive to add links.

    Over to You

    I’d love to hear if you have any additional experiences or tips in regards to chasing links, so please do get involved in the comments. 👇

    The post The Do’s and Don’ts of Chasing for a Link appeared first on Screaming Frog.

    ]]> 5
    Reviving Retired Search Console Reports Mon, 08 Apr 2019 13:18:41 +0000 Since I started my journey in the world of SEO, the old Google Search Console (GSC) has been a mainstay of every campaign I’ve worked on. Together, we’ve dealt with some horrific JavaScript issues, tackled woeful hreflang implementation, and watched site performance reach its highest highs and lowest lows. Sadly,...

    The post Reviving Retired Search Console Reports appeared first on Screaming Frog.

    Since I started my journey in the world of SEO, the old Google Search Console (GSC) has been a mainstay of every campaign I’ve worked on. Together, we’ve dealt with some horrific JavaScript issues, tackled woeful hreflang implementation, and watched site performance reach its highest highs and lowest lows.

    Sadly, all good things must come to an end, and in Jan ’19 Google announced most of the old Search Console features would be shut down for good at the end of March.

    But it’s not all doom and gloom. As a successor, we now have an updated Google Search Console v2.0 to guide us into the modern web. This new console has a fresh coat of paint, is packed with new reports, gives us 16 months of data, and provides a live link straight into Google’s index — it’s all rather lovely stuff!

    Despite all this… I still can’t help looking longingly for a few of the old reports sitting neatly tiered on the left-hand side of the browser.

    While we can’t quite turn back time, using the trusty SEO Spider we can replicate a few of these reports to fill the void for tabs now deleted or yet to be transferred over. Before jumping in, I should note this post mostly covers reports deleted or not fully transferred or across. If you can’t find something here, chances are it’s already available on GSC 2.0.

    Structured Data

    The new GSC does indeed have some structured data auditing in the new ‘Enhancements’ tab. However, it only monitors a few select forms of structured data (like Products and Events markup etc…). While I’m sure Google intends to expand this to cover all supported features, it doesn’t quite meet the comprehensiveness of the old report.

    Well, hot on the heels of the v11.0 release for the SEO Spider, we now have bulk structured data auditing and validation built in. To activate, just head over to Configuration > Spider > Advanced > Enable the various structured data settings shown here:

    Once your crawl is complete, there are two areas to view structured data. The first of which is in the main Structured Data tab and various sub filters, here:

    Or, if you just want to examine one lone URL, click on it and open the Structured Data Details tab at the bottom of the tool:

    There are also two exportable reports found in the main report’s menu: the Validation Errors & Warnings Summary, and the Validation Errors and Warnings.

    For the full details, have a look at:

    HTML Improvements

    The HTML Improvements was a neat little tab Google used to show off errors with page titles, meta descriptions, and non-indexable content. Mainly it highlighted when they were missing, duplicated, short, long, or non-informative.

    Unlike many other reports, rather than transferring over to the new GSC it’s been completely removed. Despite this, it’s still an incredibly important aspect of page alignment, and in Google’s own words: “there are some really good tools that help you to crawl your website to extract titles & descriptions too.” Well — taking their hint, we can use the Spider and various tabs or filters for exactly that.

    Want page title improvements? Look no further than the filters on the Page Title tab:

    Or if you’re curious about your Meta Descriptions:

    Want to see if any pages reference non-indexable content? Just sort by the Indexability column on any tab/filter combo:

    International Targeting

    Ahh, hreflang… the stuff of nightmares for even the most skilled of SEO veterans. Despite this, correctly configuring a multi-region/language domain is crucial. It not only ensures each user is served the relevant version, but also helps avoid any larger site or content issues. Thankfully, we’ve had this handy Search Console tab to help report any issues or errors with implementation:

    Google hasn’t announced the removal of this report, and no doubt it will soon be viewable within the new GSC. However, if for any reason they don’t include it, or if it takes a while longer to migrate across, then look no further than the hreflang tab of the SEO Spider (once enabled in Configuration > Spider > hreflang).

    With detailed filters to explore every nook and cranny of hreflang implementation — no matter what issues your site faces, you’ll be able to make actionable recommendations to bridge the language gap.

    There’s also a handful of exportable hreflang reports from the top ‘Reports’ dropdown. While I won’t go through each tab here, I’d recommend you check out the following link which explains everything involving hreflang and the spider in much more detail:

    Blocked Resources

    Another report that’s been axed — it was introduced as a way to keep track of any CSS or JavaScript files being blocked to search bots. Helping flag anything which might break the rendering, make the domain uncrawlable, or just straight up slow it down.

    While these issues have drastically decreased over the years, they’re still important to keep track of. Fortunately, after running a crawl as Googlebot (Configuration > User-Agent > Googlebot) we can find all blocked resources within the Response Codes tab of the Spider — or if you’re just looking for issues relating to rendering, have a look at the bottom Rendered Page details tab:

    Fetch as Google

    “But wait — you can just use the new URL inspect tool…”. Well, yes — you can indeed use the new URL inspect to get a live render straight from Googlebot. But I still have a few quarrels with this.

    For a start, you can only view your render from Googlebot mobile, while poor desktop is completely neglected. Secondly, the render is just a static above-the-fold screenshot, rather than the full-page scrollable view we used to get in Fetch As.

    While it’s not quite the same as a direct request from Google, we can still emulate this within the Spider’s JavaScript rendering feature. To enable JavaScript rendering head over to Configuration > Spider > Rendering and switch the drop down to JavaScript.

    Once your crawl is complete, highlight a URL and head over to the Rendered Page tab towards the bottom. Here you can view (or export) a screenshot of your rendered page, alongside a list showing all the resources needed:

    If you want to mimic Google as much as possible, try switching the User-Agent to Googlebot or Googlebot mobile (Configuration > User-Agent). This will make the Spider spoof a request as if it were Google making it.

    It’s also worth mentioning that Googlebot renders JavaScript based on v41 of Chrome, whereas the Spider uses the updated v64 of Chromium. While there aren’t many massive differences between the two, there may be some discrepancies.

    As a bonus, if you still want a desktop render direct from Google (or don’t have access to Search Console of a domain), the PageSpeed Insights tool still produces a static desktop image as a representation of how Googlebot is rendering a page. It’s not the most high-res or detailed image but will get the job done!

    Robots.txt tester

    Another tab I’m hopeful Google will eventually migrate over — testing your robots before submitting is crucial to avoid disallowing or blocking half your site to search engines.

    If for any reason they don’t happen to transfer this across to the new GSC, you can easily test any robot’s configuration directly within the SEO Spider (Configuration > Robots.txt > Custom).

    This window will allow you to either import a live robots.txt file or make your own custom one. You can test if an individual URL is blocked by entering it into the search at the bottom. Alternatively, run a crawl of your site and the spider will obey the custom crawl behaviour.

    For a much more in-depth guide on all the robots.txt capabilities of the SEO Spider, look here:

    URL Parameters

    An extremely useful tab — the URL Parameters helps to highlight all of the various parameter queries Google found on its journey through your site. This is particularly useful when examining the crawl efficiency or dealing with faceted navigations.

    Currently, there’s no way of replicating this report within the Spider, but we are able to get a similar sample from a crawl and some Excel tinkering.

    Just follow these steps or download the macro (linked below) –

    1. Run a crawl of the domain, export the internal HTML tab
    2. Cut & Paste the URL list into Column A of a fresh Excel sheet
    3. Highlight Column A > Data > Text-to-Columns > Delimited > Other: ? > Finish
    4. Highlight Column B > Data > Text-to-Columns > Delimited > Other: & > Finish
    5. Highlight Column A > Right-click > Delete
    6. Home > Editing > Find & Select > Go to Special > Blanks > OK
    7. With these highlighted > Home > Cells > Delete
    8. CTRL+A to highlight everything > Find & Replace > Replace: =* with nothing
    9. Stack all columns into one & add a heading of ‘Parameter’
    10. Highlight this master column > Insert > Pivot Table > Recommended > Count of Parameter

    To save some time, I’ve made an Excel macro to do this all for you, which you can download here. Just download the spreadsheet > click Enable Content & Enable Editing then follow the instructions.

    If everything’s done correctly, you should end up with a new table similar to this:

    It’s worth noting there will be some discrepancies between this and Google’s own URL report. This boils down to the fundamental differences between the Spider & Googlebot, most of which is explained in much greater detail here:

    The King Is Dead, Long Live the King!

    Well, that’s all for now — hopefully you find some of these reports useful. If you want a full list of our other how-to guides, take a look through our user guide & FAQ pages. Alternatively, if you have any other suggestions and alternatives to the retired Google system, I’d love to hear about them in the comments below.

    As a side note: for many of these reports, you can also combine them with the Scheduling feature to keep them running on a regular basis. Or, if you’d like some automatic reporting, take a quick look at setting this up in the Crawl Reporting in Google Data Studio of my previous post.

    The post Reviving Retired Search Console Reports appeared first on Screaming Frog.

    ]]> 13
    Screaming Frog SEO Spider Update – Version 11.0 Tue, 05 Mar 2019 09:52:50 +0000 We are delighted to announce the release of Screaming Frog SEO Spider version 11.0, codenamed internally as ‘triples’, which is a big hint for those in the know. In version 10 we introduced many new features all at once, so we wanted to make this update smaller, which also means...

    The post Screaming Frog SEO Spider Update – Version 11.0 appeared first on Screaming Frog.

    We are delighted to announce the release of Screaming Frog SEO Spider version 11.0, codenamed internally as ‘triples’, which is a big hint for those in the know.

    In version 10 we introduced many new features all at once, so we wanted to make this update smaller, which also means we can release it quicker. This version includes one significant exciting new feature and a number of smaller updates and improvements. Let’s get to them.

    1) Structured Data & Validation

    Structured data is becoming increasingly important to provide search engines with explicit clues about the meaning of pages, and enabling special search result features and enhancements in Google.

    The SEO Spider now allows you to crawl and extract structured data from the three supported formats (JSON-LD, Microdata and RDFa) and validate it against specifications and Google’s 25+ search features at scale.

    Structured Data

    To extract and validate structured data you just need to select the options under ‘Config > Spider > Advanced’.

    Structured Data Advanced Configuration

    Structured data itemtypes will then be pulled into the ‘Structured Data’ tab with columns for totals, errors and warnings discovered. You can filter URLs to those containing structured data, missing structured data, the specific format, and by validation errors or warnings.

    Structured Data tab

    The structured data details lower window pane provides specifics on the items encountered. The left-hand side of the lower window pane shows property values and icons against them when there are errors or warnings, and the right-hand window provides information on the specific issues discovered.

    The right-hand side of the lower window pane will detail the validation type (, or a Google Feature), the severity (an error, warning or just info) and a message for the specific issue to fix. It will also provide a link to the specific property.

    In the random example below from a quick analysis of the ‘car insurance’ SERPs, we can see have Google Product feature validation errors and warnings. The right-hand window pane lists those required (with an error), and recommended (with a warning).

    Structured Data Details tab

    As ‘product’ is used on these pages, it will be validated against Google product feature guidelines, where an image is required, and there are half a dozen other recommended properties that are missing.

    Another example from the same SERP, is Hastings Direct who have a Google Local Business feature validation error against the use of ‘UK’ in the ‘addressCountry‘ schema property.

    Structured Data Details Tab Error!

    The right-hand window pane explains that this is because the format needs to be two-letter ISO 3166-1 alpha-2 country codes (and the United Kingdom is ‘GB’). If you check the page in Google’s structured data testing tool, this error isn’t picked up. Screaming Frog FTW.

    The SEO Spider will validate against 26 of Google’s 28 search features currently and you can see the full list in our structured data section of the user guide.

    As many of you will be aware, frustratingly Google don’t currently provide an API for their own Structured Data Testing Tool (at least a public one we can legitimately use) and they are slowly rolling out new structured data reporting in Search Console. As useful as the existing SDTT is, our testing found inconsistency in what it validates, and the results sometimes just don’t match Google’s own documented guidelines for search features (it often mixes up required or recommended properties for example).

    We researched alternatives, like using the Yandex structured data validator (which does have an API), but again, found plenty of inconsistencies and fundamental differences to Google’s feature requirements – which we wanted to focus upon, due to our core user base.

    Hence, we went ahead and built our own structured data validator, which considers both specifications and Google feature requirements. This is another first to be seen in the SEO Spider, after previously introducing innovative new features such as JavaScript Rendering to the market.

    There are plenty of nuances in structured data and this feature will not be perfect initially, so please do let us know if you spot any issues and we’ll fix them up quickly. We obviously recommend using this new feature in combination with Google’s Structured Data Testing Tool as well.

    2) Structured Data Bulk Exporting

    As you would expect, you can bulk export all errors and warnings via the ‘reports’ top-level menu.

    Structured Data Validation Error & Warning Reports

    The ‘Validation Errors & Warnings Summary’ report is a particular favourite, as it aggregates the data to unique issues discovered (rather than reporting every instance) and shows the number of URLs affected by each issue, with a sample URL with the specific issue. An example report can be seen below.

    Structured Data Validation Summary Report

    This means the report is highly condensed and ideal for a developer who wants to know the unique validation issues that need to be fixed across the site.

    3) Multi-Select Details & Bulk Exporting

    You can now select multiple URLs in the top window pane, view specific lower window details for all the selected URLs together, and export them. For example, if you click on three URLs in the top window, then click on the lower window ‘inlinks’ tab, it will display the ‘inlinks’ for those three URLs.

    You can also export them via the right click or the new export button available for the lower window pane.

    Multi-Select Bulk Exporting

    Obviously this scales, so you can do it for thousands, too.

    This should provide a nice balance between exporting everything in bulk via the ‘Bulk Export’ menu and then filtering in spreadsheets, or the previous singular option via the right click.

    4) Tree-View Export

    If you didn’t already know, you can switch from the usual ‘list view’ of a crawl to a more traditional directory ‘tree view’ format by clicking the tree icon on the UI.

    directory tree view

    However, while you were able to view this format within the tool, it hasn’t been possible to export it into a spreadsheet. So, we went to the drawing board and worked on an export which seems to make sense in a spreadsheet.

    When you export from tree view, you’ll now see the results in tree view form, with columns split by path, but all URL level data still available. Screenshots of spreadsheets generally look terrible, but here’s an export of our own website for example.

    tree-view export spread sheet

    This allows you to quickly see the break down of a website’s structure.

    5) Visualisations Improvements

    We have introduced a number of small improvements to our visualisations. First of all, you can now search for URLs, to find specific nodes within the visualisations.

    Search visualisations

    By default, the visualisations have used the last URL component for naming of nodes, which can be unhelpful if this isn’t descriptive. Therefore, you’re now able to adjust this to page title, h1 or h2.

    Node Labelling In Visualisations

    Finally, you can now also save visualisations as HTML, as well as SVGs.

    6) Smart Drag & Drop

    You can drag and drop any file types supported by the SEO Spider directly into the GUI, and it will intelligently work out what to do. For example, you can drag and drop a saved crawl and it will open it.

    You can drag and drop a .txt file with URLs, and it will auto switch to list mode and crawl them.

    Smart Drag & Drop

    You can even drop in an XML Sitemap and it will switch to list mode, upload the file and crawl that for you as well.

    Nice little time savers for hardcore users.

    7) Queued URLs Export

    You’re now able to view URLs remaining to be crawled via the ‘Queued URLs’ export available under ‘Bulk Export’ in the top level menu.

    queued URLs export

    This provides an export of URLs discovered and in the queue to be crawled (in order to be crawled, based upon a breadth-first crawl).

    8) Configure Internal CDNs

    You can now supply a list of CDNs to be treated as ‘Internal’ URLs by the SEO Spider.

    CDN Configuration

    This feature is available under ‘Configuration > CDNs’ and both domains and subfolder combinations can be supplied. URLs will then be treated as internal, meaning they appear under the ‘Internal’ tab, will be used for discovery of new URLs, and will have data extracted like other internal URLs.

    9) GA Extended URL Matching

    Finally, if you have accounts that use extended URL rewrite filters in Google Analytics to view the full page URL (and convert /example/ to in the interface, they break what is returned from the API, and shortcuts in the interface (i.e they return

    This means URLs won’t match when you perform a crawl obviously. We’ve now introduced an algorithm which will take this into account automatically and match the data for you, as it was really quite annoying.

    Other Updates

    Version 11.0 also includes a number of smaller updates and bug fixes, outlined below.

    • The ‘URL Info’ and ‘Image Info’ lower window tabs has been renamed from ‘Info’ to ‘Details’ respectively.
    • ‘Auto Discover XML Sitemaps via robots.txt’ has been unticked by default for list mode (it was annoyingly ticked by default in version 10.4!).
    • There’s now a ‘Max Links per URL to Crawl’ configurable limit under ‘Config > Spider > Limits’ set at 10k max.
    • There’s now a ‘Max Page Size (KB) to Crawl’ configurable limit under ‘Config > Spider > Limits’ set at 50k.
    • There are new tool tips across the GUI to provide more helpful information on configuration options.
    • The HTML parser has been updated to fix an error with unquoted canonical URLs.
    • A bug has been fixed where GA Goal Completions were not showing.

    That’s everything. If you experience any problems with the new version, then please do just let us know via support and we can help. Thank you to everyone for all their feature requests, bug reports and general support, Screaming Frog would not be what it is, without you all.

    Now, go and download version 11.0 of the Screaming Frog SEO Spider.

    Small Update – Version 11.1 Released 13th March 2019

    We have just released a small update to version 11.1 of the SEO Spider. This release is mainly bug fixes and small improvements –

    • Add 1:1 hreflang URL report, available under ‘Reports > Hreflang > All hreflang URLs’.
    • Cleaned up the preset user-agent list.
    • Fix issue reading XML sitemaps with leading blank lines.
    • Fix issue with parsing and validating structured data.
    • Fix issue with list mode crawling more than the list.
    • Fix issue with list mode crawling of XML sitemaps.
    • Fix issue with scheduling UI unable to delete/edit tasks created by 10.x.
    • Fix issue with visualisations, where the directory tree diagrams were showing the incorrect URL on hover.
    • Fix issue with GA/GSC case insensitivty and trailing slash options.
    • Fix crash when JavaScript crawling with cookies enabled.

    Small Update – Version 11.2 Released 9th April 2019

    We have just released a small update to version 11.2 of the SEO Spider. This release is mainly bug fixes and small improvements –

    • Update to 3.5 which was released on the 1st of April.
    • Update splash screen, so it’s not always on top and can be dragged.
    • Ignore HTML inside amp-list tags.
    • Fix crash in visualisations when focusing on a node and using search.
    • Fix issue with ‘Bulk Export > Queued URLs’ failing for crawls loaded from disk.
    • Fix issue loading scheduling UI with task scheduled by version 10.x.
    • Fix discrepancy between master and detail view Structured Data warnings when loading in a saved crawl.
    • Fix crash parsing RDF.
    • Fix ID stripping issue with Microdata parsing.
    • Fix crashing in Google Structured Data validation.
    • Fix issue with JSON-LD parse errors not being shown for pages with multiple JSON-LD sections.
    • Fix displaying of Structured Data values to not include escape characters.
    • Fix issue with not being able to read Sitemaps containing a BOM (Byte Order Mark).
    • Fix Forms based Authentication so forms can be submitted by pressing enter.
    • Fix issue with URLs ending ?foo.xml throwing off list mode.
    • Fix GA to use URL with highest number of sessions when configuration options lead to multiple GA URLs matching.
    • Fix issue opening crawls via .seospider files with ++ in their file name.

    Small Update – Version 11.3 Released 30th May 2019

    We have just released a small update to version 11.3 of the SEO Spider. This release is mainly bug fixes and small improvements –

    • Added relative URL support for robots.txt redirects.
    • Fix crash importing crawl file as a configuration file.
    • Fix crash when clearing config in SERP mode
    • Fix crash loading in configuration to perform JavaScript crawling on a platform that doesn’t support it.
    • Fix crash creating images sitemap.
    • Fix crash in right click remove in database mode.
    • Fix crash in scheduling when editing tasks on Windows.
    • Fix issue with Sitemap Hreflang data not being attached when uploading a sitemap in List mode.
    • Fix configuration window too tall for small screens.
    • Fix broken FDD HTML export.
    • Fix unable to read sitemap with BOM when in Spider mode.

    The post Screaming Frog SEO Spider Update – Version 11.0 appeared first on Screaming Frog.

    ]]> 46
    Learn To Crawl: SEO Spider Training Days Tue, 12 Feb 2019 14:59:44 +0000 On the 24th of January SEOs gathered in London for Screaming Frog’s inaugural SEO Spider Training Event. Attendees flew in from far-flung places such as France, Germany, and even… Cornwall. (If you’re British you’ll appreciate just how far away that is!) Their destination was Marble Arch, London. More specifically, room...

    The post Learn To Crawl: SEO Spider Training Days appeared first on Screaming Frog.

    On the 24th of January SEOs gathered in London for Screaming Frog’s inaugural SEO Spider Training Event. Attendees flew in from far-flung places such as France, Germany, and even… Cornwall. (If you’re British you’ll appreciate just how far away that is!)

    Their destination was Marble Arch, London. More specifically, room ‘Adjust’ within the exquisite function centre we’d hired for the event. Other rooms on the same floor were named: ’Accept’, ‘Action’, ‘Affirm’, ‘Assume’, and ‘Agree’; So positive vibes (and jokes about adjusting crawl speed) were felt throughout the day.

    Veteran SEO Frog and all-round nice guy Charlie Williams was our expert for the day. Charlie’s day was targeted towards intermediate users who knew how to crawl sites, but wanted to get the most out of the plethora of extra features the SEO Spider ships with after nine years of continuous development and improvement.

    His excellent sermon, which was frequently expanded on through enticing audience questions, was divided into the following topics:

    • Setup, configuration, & crawling
    • Advanced crawling scenarios
    • External data & API integration
    • Analysis & reporting
    • Debugging & diagnosis

    (Spoiler Alert!) For more specific details on what was covered, Ian from Venture Stream who attended has put together this great roundup.

    We also had an exclusive live link (Twitter DMs) back to Frog HQ, so we could pass any questions straight back to the development team in real time. While they were super helpful on numerous queries, they remained tight-lipped when pressed on what might be included in the upcoming SF Version 11…

    Hot actionable advice wasn’t the only thing on the menu, though. There was food on the menu too! Frequent coffee breaks gave everyone time to refresh and network; and our venue provided a premium cooked lunch which was delicious, and crucially came with amle pudding. (No, frogs legs were not an option.)

    Another added bonus was a helping of branded swag to take home- bottles, pens, notebooks, and those illusive SF stickers that everyone wants for their laptops.

    We had some great feedback from attendees:

    • Of those surveyed, 88% rated the day as either ‘very good’ or ‘excellent’.
    • 88% of those surveyed felt that the event was at the right skill level for the audience- something we were very keen to get right!
    • 100% said they would recommend the training to a friend

    Some other feedback included:

    “Charlie was a top bloke and it was a great place to learn more about the features of Screaming Frog I seldom use. It was a great place to voice very technical queries and issues. The live link to HQ via the SF helpers was also a great addition.”

    “The event was very well structured and thought out. Individual sessions were just long enough to keep the audiences attention without splitting the day in too many separate parts. I especially liked our host, as he was able to explain complex subjects very easily understandable.”

    “I learned things that I didn’t know existed – for example what you can do with AOPIS from GA/SC… I also liked being surrounded by high level SEO people- it’s not often that you get to meet such experts.”

    If you consider yourself a budding technical SEO, and you want to gain total confidence using Screaming Frog’s SEO spider then you’re in luck. Our next training event will be on the 18th of March in London. You can get an early bird ticket here, though act fast, as our first event sold out quickly and we have very limited spaces!

    We’re also open to running more bespoke inhouse training events, if you have an internal SEO team, or you’re an agency, you’re welcome to pop us an email via support.

    The post Learn To Crawl: SEO Spider Training Days appeared first on Screaming Frog.

    ]]> 6
    SEO Spider Companion Tools, Aka ‘The Magnificent Seven’ Fri, 08 Feb 2019 12:43:02 +0000 From crawl completion notifications to automated reporting: this post may not have Billy the Kid or Butch Cassidy, instead, here are a few of my most useful tools to combine with the SEO Spider, (just as exciting). We SEOs are extremely lucky—not just because we’re working in such an engaging...

    The post SEO Spider Companion Tools, Aka ‘The Magnificent Seven’ appeared first on Screaming Frog.

    From crawl completion notifications to automated reporting: this post may not have Billy the Kid or Butch Cassidy, instead, here are a few of my most useful tools to combine with the SEO Spider, (just as exciting).

    We SEOs are extremely lucky—not just because we’re working in such an engaging and collaborative industry, but we have access to a plethora of online resources, conferences and SEO-based tools to lend a hand with almost any task you could think up.

    My favourite of which is, of course, the SEO Spider—after all, following Minesweeper Outlook, it’s likely the most used program on my work PC. However, a great programme can only be made even more useful when combined with a gang of other fantastic tools to enhance, compliment or adapt the already vast and growing feature set.

    While it isn’t quite the ragtag group from John Sturges’ 1960 cult classic, I’ve compiled the Magnificent Seven(ish) SEO tools I find useful to use in conjunction with the SEO Spider:

    Debugging in Chrome Developer Tools

    Chrome is the definitive king of browsers, and arguably one of the most installed programs on the planet. What’s more, it’s got a full suite of free developer tools built straight in—to load it up, just right-click on any page and hit inspect. Among many aspects, this is particularly handy to confirm or debunk what might be happening in your crawl versus what you see in a browser.

    For instance, while the Spider does check response headers during a crawl, maybe you just want to dig a bit deeper and view it as a whole? Well, just go to the Network tab, select a request and open the Headers sub-tab for all the juicy details:

    Perhaps you’ve loaded a crawl that’s only returning one or two results and you think JavaScript might be the issue? Well, just hit the three dots (highlighted above) in the top right corner, then click settings > debugger > disable JavaScript and refresh your page to see how it looks:

    Or maybe you just want to compare your nice browser-rendered HTML to that served back to the Spider? Just open the Spider and enable ‘JavaScript Rendering’ & ‘Store Rendered HTML’ in the configuration options (Configuration > Spider > Rendering/Advanced), then run your crawl. Once complete, you can view the rendered HTML in the bottom ‘View Source’ tab and compare with the rendered HTML in the ‘elements’ tab of Chrome.

    There are honestly far too many options in the Chrome developer toolset to list here, but it’s certainly worth getting your head around.

    Page Validation with a Right-Click

    Okay, I’m cheating a bit here as this isn’t one tool, rather a collection of several, but have you ever tried right-clicking a URL within the Spider? Well, if not, I’d recommend giving it a go—on top of some handy exports like the crawl path report and visualisations, there’s a ton of options to open that URL into several individual analysis & validation apps:

    • Google Cache – See how Google is caching and storing your pages’ HTML.
    • Wayback Machine – Compare URL changes over time.
    • Other Domains on IP – See all domains registered to that IP Address.
    • Open Robots.txt – Look at a site’s Robots.
    • HTML Validation with W3C – Double-check all HTML is valid.
    • PageSpeed Insights – Any areas to improve site speed?
    • Structured Data Tester – Check all on-page structured data.
    • Mobile-Friendly Tester – Are your pages mobile-friendly?
    • Rich Results Tester – Is the page eligible for rich results?
    • AMP Validator – Official AMP project validation test.

    User Data and Link Metrics via API Access

    We SEOs can’t get enough data, it’s genuinely all we crave – whether that’s from user testing, keyword tracking or session information, we want it all and we want it now! After all, creating the perfect website for bots is one thing, but ultimately the aim of almost every site is to get more users to view and convert on the domain, so we need to view it from as many angles as possible.

    Starting with users, there’s practically no better insight into user behaviour than the raw data provided by both Google Search Console (GSC) and Google Analytics (GA), both of which help us make informed, data-driven decisions and recommendations.

    What’s great about this is you can easily integrate any GA or GSC data straight into your crawl via the API Access menu so it’s front and centre when reviewing any changes to your pages. Just head on over to Configuration > API Access > [your service of choice], connect to your account, configure your settings and you’re good to go.

    Another crucial area in SERP rankings is the perceived authority of each page in the eyes of search engines – a major aspect of which, is (of course), links., links and more links. Any SEO will know you can’t spend more than 5 minutes at BrightonSEO before someone brings up the subject of links, it’s like the lifeblood of our industry. Whether their importance is dying out or not there’s no denying that they currently still hold much value within our perceptions of Google’s algorithm.

    Well, alongside the previous user data you can also use the API Access menu to connect with some of the biggest tools in the industry such as Moz, Ahrefs or Majestic, to analyse your backlink profile for every URL pulled in a crawl.

    For all the gory details on API Access check out the following page (scroll down for other connections):

    Understanding Bot Behaviour with the Log File Analyzer

    An often-overlooked exercise, nothing gives us quite the insight into how bots are interacting through a site than directly from the server logs. The trouble is, these files can be messy and hard to analyse on their own, which is where our very own Log File Analyzer (LFA) comes into play, (they didn’t force me to add this one in, promise!).

    I’ll leave @ScreamingFrog to go into all the gritty details on why this tool is so useful, but my personal favourite aspect is the ‘Import URL data’ tab on the far right. This little gem will effectively match any spreadsheet containing URL information with the bot data on those URLs.

    So, you can run a crawl in the Spider while connected to GA, GSC and a backlink app of your choice, pulling the respective data from each URL alongside the original crawl information. Then, export this into a spreadsheet before importing into the LFA to get a report combining metadata, session data, backlink data and bot data all in one comprehensive summary, aka the holy quadrilogy of technical SEO statistics.

    While the LFA is a paid tool, there’s a free version if you want to give it a go.

    Crawl Reporting in Google Data Studio

    One of my favourite reports from the Spider is the simple but useful ‘Crawl Overview’ export (Reports > Crawl Overview), and if you mix this with the scheduling feature, you’re able to create a simple crawl report every day, week, month or year. This allows you to monitor and for any drastic changes to the domain and alerting to anything which might be cause for concern between crawls.

    However, in its native form it’s not the easiest to compare between dates, which is where Google Sheets & Data Studio can come in to lend a hand. After a bit of setup, you can easily copy over the crawl overview into your master G-Sheet each time your scheduled crawl completes, then Data Studio will automatically update, letting you spend more time analysing changes and less time searching for them.

    This will require some fiddling to set up; however, at the end of this section I’ve included links to an example G-Sheet and Data Studio report that you’re welcome to copy. Essentially, you need a G-Sheet with date entries in one column and unique headings from the crawl overview report (or another) in the remaining columns:

    Once that’s sorted, take your crawl overview report and copy out all the data in the ‘Number of URI’ column (column B), being sure to copy from the ‘Total URI Encountered’ until the end of the column.

    Open your master G-Sheet and create a new date entry in column A (add this in a format of YYYYMMDD). Then in the adjacent cell, Right-click > ‘Paste special’ > ‘Paste transposed’ (Data Studio prefers this to long-form data):

    If done correctly with several entries of data, you should have something like this:

    Once the data is in a G-Sheet, uploading this to Data Studio is simple, just create a new report > add data source > connect to G-Sheets > [your master sheet] > [sheet page] and make sure all the heading entries are set as a metric (blue) while the date is set as a dimension (green), like this:

    You can then build out a report to display your crawl data in whatever format you like. This can include scorecards and tables for individual time periods, or trend graphs to compare crawl stats over the date range provided, (you’re very own Search Console Coverage report).

    Here’s an overview report I quickly put together as an example. You can obviously do something much more comprehensive than this should you wish, or perhaps take this concept and combine it with even more reports and exports from the Spider.

    If you’d like a copy of both my G-Sheet and Data Studio report, feel free to take them from here:
    Master Crawl Overview G-Sheet:
    Crawl Overview Data Studio Report:

    Note: if you take a copy some of the dimension formats may change within DataStudio (breaking the graphs), so it’s worth checking the date dimension is still set to ‘Date (YYYMMDD)’

    Building Functions & Strings with XPath Helper & Regex Search

    The Spider is capable of doing some very cool stuff with the extraction feature, a lot of which is listed in our guide to web scraping and extraction. The trouble with much of this is it will require you to build your own XPath or regex string to lift your intended information.

    While simply right-clicking > Copy XPath within the inspect window will usually do enough to scrape, by it’s not always going to cut it for some types of data. This is where two chrome extensions, XPath Helper & Regex- Search come in useful.

    Unfortunately, these won’t automatically build any strings or functions, but, if you combine them with a cheat sheet and some trial and error you can easily build one out in Chrome before copying into the Spider to bulk across all your pages.

    For example, say I wanted to get all the dates and author information of every article on our blog subfolder (

    If you simply right clicked on one of the highlighted elements in the inspect window and hit Copy > Copy XPath, you would be given something like:

    While this does the trick, it will only pull the single instance copied (‘16 January, 2019 by Ben Fuller’). Instead, we want all the dates and authors from the /blog subfolder.

    By looking at what elements the reference is sitting in we can slowly build out an XPath function directly in XPath Helper and see what it highlights in Chrome. For instance, we can see it sits in a class of ‘main-blog–posts_single-inner–text–inner clearfix’, so pop that as a function into XPath Helper:
    //div[@class="main-blog--posts_single-inner--text--inner clearfix"]

    XPath Helper will then highlight the matching results in Chrome:

    Close, but this is also pulling the post titles, so not quite what we’re after. It looks like the date and author names are sitting in a sub <p> tag so let’s add that into our function:
    (//div[@class="main-blog--posts_single-inner--text--inner clearfix"])/p

    Bingo! Stick that in the custom extraction feature of the Spider (Configuration > Custom > Extraction), upload your list of pages, and watch the results pour in!

    Regex Search works much in the same way: simply start writing your string, hit next and you can visually see what it’s matching as you’re going. Once you got it, whack it in the Spider, upload your URLs then sit back and relax.

    Notifications & Auto Mailing Exports with Zapier

    Zapier brings together all kinds of web apps, letting them communicate and work with one another when they might not otherwise be able to. It works by having an action in one app set as a trigger and another app set to perform an action as a result.

    To make things even better, it works natively with a ton of applications such as G-Suite, Dropbox, Slack, and Trello. Unfortunately, as the Spider is a desktop app, we can’t directly connect it with Zapier. However, with a bit of tinkering, we can still make use of its functionality to provide email notifications or auto mailing reports/exports to yourself and a list of predetermined contacts whenever a scheduled crawl completes.

    All you need is to have your machine or server set up with an auto cloud sync directory such as those on ‘Dropbox’, ‘OneDrive’ or ‘Google Backup & Sync’. Inside this directory, create a folder to save all your crawl exports & reports. In this instance, I’m using G-drive, but others should work just as well.

    You’ll need to set a scheduled crawl in the Spider (file > Schedule) to export any tabs, bulk exports or reports into a timestamped folder within this auto-synced directory:

    Log into or create an account for Zapier and make a new ‘zap’ to email yourself or a list of contacts whenever a new folder is generated within the synced directory you selected in the previous step. You’ll have to provide Zapier access to both your G-Drive & Gmail for this to work (do so at your own risk).

    My zap looks something like this:

    The above Zap will trigger when a new folder is added to /Scheduled Crawls/ in my G-Drive account. It will then send out an email from my Gmail to myself and any other contacts, notifying them and attaching a direct link to the newly added folder and Spider exports.

    I’d like to note here that if running a large crawl or directly saving the crawl file to G-drive, you’ll need enough storage to upload (so I’d stick to exports). You’ll also have to wait until the sync is completed from your desktop to the cloud before the zap will trigger, and it checks this action on a cycle of 15 minutes, so might not be instantaneous.

    Alternatively, do the same thing on IFTTT (If This Then That) but set it so a new G-drive file will ping your phone, turn your smart light a hue of lime green or just play this sound at full volume on your smart speaker. We really are living in the future now!


    There you have it, the Magnificent Seven(ish) tools to try using with the SEO Spider, combined to form the deadliest gang in the west web. Hopefully, you find some of these useful, but I’d love to hear if you have any other suggestions to add to the list.

    The post SEO Spider Companion Tools, Aka ‘The Magnificent Seven’ appeared first on Screaming Frog.

    ]]> 7
    Google Grants Account Suspended – The Next Steps Wed, 16 Jan 2019 15:05:37 +0000 If your Google Ad Grants account has been unexpectedly suspended, this article will help you through the process of getting it back up and running in as little time as possible, as well as providing preventative measures that will keep your account suspension free. Initial Suspension Ad Grants account guidelines...

    The post Google Grants Account Suspended – The Next Steps appeared first on Screaming Frog.

    If your Google Ad Grants account has been unexpectedly suspended, this article will help you through the process of getting it back up and running in as little time as possible, as well as providing preventative measures that will keep your account suspension free.

    Initial Suspension

    Ad Grants account guidelines are renowned for being vague with regards to their disapproval reasons, and it can often be the case of automated checks flagging problems that were potentially misinterpreted by their systems. If you believe this to be the case, then jump along to the section about how to get in contact with Google and dive straight into getting your account re-reviewed.

    A more likely reason, however, will be that one of the program terms or policies has been violated somewhere in your campaigns. In this case, you will need to take a closer look at your account in order to resolve the issue.

    Diagnosing the Problem

    In most cases, if your account has violated any policies during the month, you will receive a gentle nudge via email to check your account’s compliance with programme policies. This email will link through to a policy compliance report explaining which area of the account is causing problems.

    In addition to this, Google provides an Ad Grants Policy Compliance Guide that will help you understand what’s needed to comply with the Grants programme policies.

    When reading this, go through each section to make sure that you are following each policy correctly, check and check again!

    Common reasons for suspensions

    The most common reasons for why a Grants account may have been suspended include:

    Not enough ad groups per campaign

    • Every campaign requires at least 2 ad groups per campaign, with at least 2 ads in each and 2 sitelinks across the account.

    Low Click Through Rate (CTR)

    • Not reaching the specified 5% CTR for 2 consecutive months can result in a temporary account deactivation.

    No active keywords in an ad group

    • Having an active ad group with no active keywords in it can often cause an account to be flagged. This ad group will have to be paused or have some new relevant keywords added to comply. This is something to be especially wary of if you have set up any rules to pause low quality score keywords. This is explained further in the ways of preventing future suspensions

    Low Quality Score

    • Every single keyword that is not removed or paused within your account needs to have a quality score of 3 or higher. Reminder: If the keyword is yet to be given a visible quality score and is shown with a dash (-) then this keyword is still compliant, it’s only those with a QS of 2/10 or 1/10 will cause problems.

    Not responding to Programme Survey

    • All Ad Grants accounts must complete an annual programme survey sent to the login email address. Email accounts are often lost or neglected, however, failure to complete this will result in temporary suspension. Always make sure that your contact details are up to date so that this reminder isn’t missed.

    The Next Steps

    If you have been through the checklist and can’t identify why you may have been suspended or believe Google have made an error, then your best option to understanding the suspension reason will be to get into contact with Google, and talk to their grants team.

    If however, having looked through the compliance checklist, you have identified what caused your account to be suspended, you will be in a position to request reactivation, which is detailed below.

    Getting Reactivated

    Once you’ve discovered the reason why your account has been suspended, then the next step is fairly simple.

    All you will need to do now (after you’ve made the necessary amendments), is request for reactivation of your Ad Grants account, filling in your Ad Grants account ID, sign-in email address, contact email and a brief explanation covering the steps you went through to get the account compliant in the notes section. Then press ‘Submit’.

    Once submitted, Google state you should hear back from a Googler within three business days.

    It’s worth keeping in mind, Google won’t be able to re-activate your account if there are still errors within the account, so it is your responsibility to make sure the account is fully compliant before being resubmitted.  If you do experience any problems then go through the checklist again or get into contact with Google, as detailed below.

    Contacting Google

    If you are amongst those that do not know why your account has been suspended, then you will need to contact Google directly.

    Often the easiest way to do this is by remaining on the Google Ads interface and clicking on the ‘CONTACT US’ button in the red deactivation banner found on your overview. Similar to the one below.


    You can get in contact with Google in one of three ways:

    • Live chat: – Often the best way to get in contact with Google when trying to diagnose the cause of an account suspension is through live support, you normally get connected quickly and it’s an easy way of getting hold of someone who can get the account re-reviewed then and there.
    • Call: – You can also contact your local Google office by phone (for example, here in the UK their number is 0800 169 0409. One problem with this method though is that they do say that the lines are only answered during office hours, which isn’t helpful if your suspension occurs at 6pm on a Sunday night!
    • Email Support: – Alternatively if you don’t have time to chat at the point of suspension, you can email the Google Grants help support at any time and should expect a reply within 24 hours.

    If you’re still unsure as to what you need to do to get back up and running there is also the ability to use the Official Google Ads Community to view/join discussions about other instances of Ad Grants account suspensions and possible remedies for these.

    Ways of preventing future suspensions

    In order to avoid having your Ad Grants account suspended again, we’ve listed below some handy tips and tricks that will help ensure the account remains compliant and suspension free.

    Automated Rules

    • If you don’t have time to check your account every day, setting up automated rules are a really good way of staying on top of the basic requirements. These rules can be used to not only pause keywords that go below a quality score of 3, but also to turn on any that reobtain a quality score of 3 or above (or are shown with no score at all).
    • Note the basic rule can sometimes leave ad groups vacant and so you should also add a script to check and pause any empty ad groups.

    Using Filters

    • If you think your account/campaigns could include any non-brand single keywords, you can easily check by filtering all keywords, with a filter that consists of ‘Does not contain’ and then entering a space ( ), all those non-brand keywords should be paused or removed. If the only keywords returned are your brand, then you may have to contact Google and register/whitelist them so that they don’t flag the keyword policy.

    Use match types to increase ad groups

    • If you are struggling to create two distinct ad groups within a campaign, then a simple (and quick) way is to build out your ad groups based on match types. E.g. exact match & broad match ad groups can be run and will abide by the ‘2 ad groups per campaign’ quota.

    Always review your compliance reports

    • If Google do send over a non-compliance report stating an infringement in policy, make sure you always look into this, even if you believe the account to be fully compliant. There is always the possibility that you may have miss-checked something or that something has changed recently which is causing Google to flag your account, so always treat this with the utmost importance in order to avoid suspension.

    In review

    Hopefully, you won’t find yourself in the position of having your Ad Grants account suspended, but if you do our final (and probably most important) piece of advice is not to panic. Follow the guidance provided and work through each of the policies, checking off each requirement as you go. Once you’re sure you’ve met all the requirements, get that review request in and you’ll soon be back up and running and driving relevant traffic to your site.

    If you’re still not sure how best to make sure your campaigns stay online, then drop us a line and we can look into how we could help!

    The post Google Grants Account Suspended – The Next Steps appeared first on Screaming Frog.

    ]]> 2
    11 Little-Known Features In The SEO Spider Tue, 08 Jan 2019 09:54:38 +0000 The Screaming Frog SEO Spider has evolved a great deal over the past 8 years since launch, with many advancements, new features and a huge variety of different ways to configure a crawl. This post covers some of the lesser-known and hidden-away features, that even more experienced users might not...

    The post 11 Little-Known Features In The SEO Spider appeared first on Screaming Frog.

    The Screaming Frog SEO Spider has evolved a great deal over the past 8 years since launch, with many advancements, new features and a huge variety of different ways to configure a crawl.

    This post covers some of the lesser-known and hidden-away features, that even more experienced users might not be aware exist. Or at least, how they can be best utilised to help improve auditing. Let’s get straight into it.

    1) Export A List In The Same Order Uploaded

    If you’ve uploaded a list of URLs into the SEO Spider, performed a crawl and want to export them in the same order they were uploaded, then use the ‘Export’ button which appears next to the ‘upload’ and ‘start’ buttons at the top of the user interface.

    Export List In Same Order Uploaded

    The standard export buttons on the dashboard will otherwise export URLs in order based upon what’s been crawled first, and how they have been normalised internally (which can appear quite random in a multi-threaded crawler that isn’t in usual breadth-first spider mode).

    The data in the export will be in the exact same order and include all of the exact URLs in the original upload, including duplicates, normalisation or any fix-ups performed.

    2) Crawl New URLs Discovered In Google Analytics & Search Console

    If you connect to Google Analytics or Search Console via the API, by default any new URLs discovered are not automatically added to the queue and crawled. URLs are loaded, data is matched against URLs in the crawl, and any orphan URLs (URLs discovered only in GA or GSC) are available via the ‘Orphan Pages‘ report export.

    If you wish to add any URLs discovered automatically to the queue, crawl them and see them in the interface, simply enable the ‘Crawl New URLs Discovered in Google Analytics/Search Console’ configuration.

    Crawl new URLs discovered in Search Console

    This is available under ‘Configuration > API Access’ and then either ‘Google Analytics’ or ‘Google Search Console’ and their respective ‘General’ tabs.

    This will mean new URLs discovered will appear in the interface, and orphan pages will appear under the respective filter in the Analytics and Search Console tabs (after performing crawl analysis).

    orphan urls search console

    3) Switching to Database Storage Mode

    The SEO Spider has traditionally used RAM to store data, which has enabled it to crawl lightning-fast and flexibly for virtually all machine specifications. However, it’s not very scalable for crawling large websites. That’s why early last year we introduced the first configurable hybrid storage engine, which enables the SEO Spider to crawl at truly unprecedented scale for any desktop application while retaining the same, familiar real-time reporting and usability.

    So if you need to crawl millions of URLs using a desktop crawler, you really can. You don’t need to keep increasing RAM to do it either, switch to database storage instead. Users can select to save to disk by choosing ‘database storage mode’, within the interface (via ‘Configuration > System > Storage).

    Database storage rocks

    This means the SEO Spider will hold as much data as possible within RAM (up to the user allocation), and store the rest to disk. We actually recommend this as the default setting for any users with an SSD (or faster drives), as it’s just as fast and uses much less RAM.

    Please see our guide on how to crawl very large websites for more detail.

    4) Request Google Analytics, Search Console & Link Data After A Crawl

    If you’ve already performed a crawl and forgot to connect to Google Analytics, Search Console or an external link metrics provider, then fear not. You can connect to any of them post crawl, then click the beautifully hidden ‘Request API Data’ button at the bottom of the ‘API’ tab.

    Request API Data

    Alternatively, ‘Request API Data’ is also available in the ‘Configuration > API Access’ main menu.

    request API Data button in the menu

    This will mean data is pulled from the respective APIs and matched against the URLs that have already been crawled.

    5) Disable HSTS To See ‘Real’ Redirect Status Codes

    HTTP Strict Transport Security (HSTS) is a standard by which a web server can declare to a client that it should only be accessed via HTTPS. By default the SEO Spider will respect HSTS and if declared by a server and an internal HTTP link is discovered during a crawl, a 307 status code will be reported with a status of “HSTS Policy” and redirect type of “HSTS Policy”. Reporting HSTS set-up is useful when auditing security, and the 307 response code provides an easy way to discover insecure links.

    Unlike usual redirects, this redirect isn’t actually sent by the web server, it’s turned around internally (by a browser and the SEO Spider) which simply requests the HTTPS version instead of the HTTP URL (as all requests must be HTTPS). A 307 status code is reported however, as you must set an expiry for HSTS. This is why it’s a temporary redirect.

    While HSTS declares that all requests should be made over HTTPS, a site wide HTTP -> HTTPS redirect is still needed. This is because the Strict-Transport-Security header is ignored unless it’s sent over HTTPS. So if the first visit to your site is not via HTTPS, you still need that initial redirect to HTTPS to deliver the Strict-Transport-Security header.

    So if you’re auditing an HTTP to HTTPS migration which has HSTS enabled, you’ll want to check the underlying ‘real’ sitewide redirect status code in place (and find out whether it’s a 301 redirect). Therefore, you can choose to disable HSTS policy by unticking the ‘Respect HSTS Policy’ configuration under ‘Configuration > Spider > Advanced’ in the SEO Spider.

    disable HSTS policy

    This means the SEO Spider will ignore HSTS completely and report upon the underlying redirects and status codes. You can switch back to respecting HSTS when you know they are all set-up correctly, and the SEO Spider will just request the secure versions of URLs again. Check out our SEOs guide to crawling HSTS.

    6) Compare & Run Crawls Simultaneously

    At the moment you can’t compare crawls directly in the SEO Spider. However, you are able to open up multiple instances of the software, and either run multiple crawls, or compare crawls at the same time.

    Compare Crawls Multiple Instances

    On Windows, this is as simple as just opening the software again by the shortcut. For macOS, to open additional instances of the SEO Spider open a Terminal and type the following:

    open -n /Applications/Screaming\ Frog\ SEO\

    You can now perform multiple crawls, or compare multiple crawls at the same time.

    7) Crawl Any Web Forms, Logged In Areas & By-Pass Bot Protection

    The SEO Spider has supported basic and digest standards-based authentication for a long-time, which are often used for secure access to development servers and staging sites. However, the SEO Spider also has the ability to login to any web form that requires cookies, using its in-built Chromium browser.

    This nifty feature can be found under ‘Configuration > Authentication > Forms Based’, where you can load virtually any password-protected website, intranet or web application, login and crawl it. For example you can login and crawl your precious fantasy football if you really wanted to ruin (or perhaps improve) your team.

    web forms authentication

    This feature is super powerful because it provides a way to set cookies in the SEO Spider, so it can also be used for scenarios such as bypassing geo IP redirection, or if a site is using bot protection with reCAPTCHA or the like.

    bot protection

    You can just load the page in the in-built browser, confirm you’re not a robot – and crawl away. If you load the page initially pre-crawling, you probably won’t even see a CAPTCHA, and will be issued the required cookies. Obviously you should have permission from the website as well.

    However, with great power comes great responsibly, so please be careful with this feature.

    During testing we let the SEO Spider loose on our test site while signed in as an ‘Administrator’ for fun. We let it crawl for half an hour; in that time it installed and set a new theme for the site, installed 108 plugins and activated 8 of them, deleted some posts, and generally made a mess of things.

    With this in mind, please read our guide on crawling password protected websites responsibly.

    8) Crawl (& Remove) URL Fragments Using JavaScript Rendering Mode

    Occassionally it can be useful to crawl URLs with fragments (/page-name/#this-is-a-fragment) when auditing a website, and by default the SEO Spider will crawl them in JavaScript rendering mode.

    You can see our FAQs which use them below.

    While this can be helpful, the search engines will obviously ignore anything from the fragment and crawl and index the URL without it. Therefore, generally you may wish to switch this behaviour using the ‘Regex replace’ feature in URL Rewriting. Simply include #.* within the ‘regex’ filed and leave the ‘replace’ field blank.

    Remove hash fragment in JavaScript Rendering

    This will mean they will be crawled and indexed without fragments in the same way as the default HTML text only mode.

    9) Utilise ‘Crawl Analysis’ For Link Score, More Data (& Insight)

    While some of the features discussed above have been available for sometime, the ‘crawl analysis‘ feature was released more recently in version 10 at the end of September (2018).

    The SEO Spider analyses and reports data at run-time, where metrics, tabs and filters are populated during a crawl. However, ‘link score’ which is an internal PageRank calculation, and a small number of filters require calculation at the end of a crawl (or when a crawl has been paused at least).

    The full list of 13 items that require ‘crawl analysis’ can be seen under ‘Crawl Analysis > Configure’ in the top level menu of the SEO Spider, and viewed below.

    crawl analysis

    All of the above are filters under their respective tabs, apart from ‘Link Score’, which is a metric and shown as a column in the ‘Internal’ tab.

    In the right hand ‘overview’ window pane, filters which require post ‘crawl analysis’ are marked with ‘Crawl Analysis Required’ for further clarity. The ‘Sitemaps’ filters in particular, mostly require post-crawl analysis.

    Right hand overview crawl analysis required

    They are also marked as ‘You need to perform crawl analysis for this tab to populate this filter’ within the main window pane.

    Crawl Analysis tabs message

    This analysis can be automatically performed at the end of a crawl by ticking the respective ‘Auto Analyse At End of Crawl’ tickbox under ‘Configure’, or it can be run manually by the user.

    To run the crawl analysis, simply click ‘Crawl Analysis > Start’.

    Start Crawl Analysis

    When the crawl analysis is running you’ll see the ‘analysis’ progress bar with a percentage complete. The SEO Spider can continue to be used as normal during this period.

    Crawl Analysis Running

    When the crawl analysis has finished, the empty filters which are marked with ‘Crawl Analysis Required’, will be populated with lots of lovely insightful data.

    Filter populated after crawl analysis

    The ‘link score’ metric is displayed in the Internal tab and calculates the relative value of a page based upon its internal links.

    This uses a relative 0-100 point scale from least to most value for simplicity, which allows you to determine where internal linking might be improved for key pages. It can be particularly powerful when utlised with other internal linking data, such as counts of inlinks, unique inlinks and % of links to a page (from accross the website).

    10) Saving HTML & Rendered HTML To Help Debugging

    We occasionally receive support queries from users reporting a missing page title, description, canonical or on-page content that’s seemingly not being picked up by the SEO Spider, but can be seen to exist in a browser, and when viewing the HTML source.

    Often this is assumed to be a bug of somekind, but most of the time it’s just down to the site responding differently to a request made from a browser rather than the SEO Spider, based upon the user-agent, accept-language header, whether cookies are accepted, or if the server is under load as examples.

    Therefore an easy way to self-diagnose and investigate is to see exactly what the SEO Spider can see, by choosing to save the HTML returned by the server in the response.

    By navigating to ‘Configuration > Spider > Advanced’ you can choose to store both the original HTML and rendered HTML to inspect the DOM (when in JavaScript rendering mode).

    Store HTML

    When a URL has been crawled, the exact HTML that was returned to the SEO Spider when it crawled the page can be viewed in the lower window ‘view source’ tab.

    View HTML source

    By viewing the returned HTML you can debug the issue, and then adjusting with a different user-agent, or accepting cookies etc. For example, you would see the missing page title, and then be able to identify the conditions under which it’s missing.

    This feature is a really powerful way to diagnose issues quickly, and get a better understanding of what the SEO Spider is able to see and crawl.

    11) Using Saved Configuration Profiles With The CLI

    In the latest update, version 10 of the SEO Spider, we introduced the command line interface. The SEO Spider can be operated via command line, including launching, saving and exporting, and you can use –help to view the full arguments available.


    However, not all configuration options are available, as there would be hundreds of arguments if you consider the full breath available. So the trick is to use saved configuration profiles for more advanced scenarios.

    Open up the SEO Spider GUI, select your options, whether that’s basic configurations, or more advanced features like custom search, extraction, and then save the configuration profile.

    To save the configuration profile, click ‘File > Save As’ and adjust the file name (ideally to something descriptive!).

    Save configuration profile to use with CLI

    You can then supply the config argument to set your configuration profile for the command line crawl (and use in the future).

    --config "C:\Users\Your Name\Crawls\super-awesome.seospiderconfig"

    This really opens up the possibilites for utlising the SEO Spider via the command line.

    What Have We Missed?

    We’d love to hear any other little known features and configurations that you find helpful, and are often overlooked or just hidden away.

    The post 11 Little-Known Features In The SEO Spider appeared first on Screaming Frog.

    ]]> 23