Best ways to download your sitemap

Concept of website flowchart sitemap

If you need to create a sitemap, then this isn’t for you – this tool allows you to download your existing sitemap and analyse it.

A sitemap is an important tool in your arsenal for helping Google find new pages on your site, it shouldn’t be your only tool but it should be used.

Having a Sitemap without any errors?

It’s important that having that is up to and correct. After all, it’s a great way for Google to find new content on your site.

However, you do need to make sure the data within the file is correct.

There are a few ways you can get this data:

  1. Use our free tool to download your sitemap and export to csv
  2. Go to the sitemap file and copy and paste into excel (you then need to sort out formatting and delete all the code sections
  3. Use Google Sheets to pull in this data – this was my preferred method till it stopped working
  4. Build your own tool to download the sitemap

So I decided I would get a desktop tool built which made my life so much easier – I chuck in a sitemap URL and within about 10 seconds it spits out me a nice CSV file with all the details I need.

The problem was this was desktop based and I wanted to give you all access to it, so I got someone to rebuild it and make it cloud-based.

While in theory there are no limits to it, if you want to analyse a file in excel, then I think the current row limit is about 1 million so if you have more than 1  million URLs then you will need to analyse the file with a different programme.

What to Analyse:

There are several elements you want to check:

  1. Every URL on your site which you want to appear in Google is included in the sitemap
  2. Every URL in the sitemap is 200 response
  3. Every URL in the sitemap is found via a crawl
  4. Google is crawling every URL at least once per month*

Let’s go into these in a bit more detail. 

Every URL on your site which you want to appear in Google is included in the sitemap

This is especially important if you add new content frequently to your site or have a site with a lot of pages.

Its the quickest and easiest way for Google and other search engines to find this new content.

Every URL in the sitemap is 200 response

As Google uses this document to find new pages on your site, you need to make sure they trust the document – all pages need to be active and returning a 200 response.

Otherwise, Google might lose trust the file and not visit it as often meaning they don’t find new content as quick or more importantly when you update existing content it takes a while for Google to see this.

If Google doesn’t crawl the page after you have made changes then it can’t impact the rankings of that page.

As well as dead pages, you shouldn’t be linking to redirect pages either for the same reason.

Every URL in the sitemap is found via a crawl

A sitemap is just one way of Google finding new content, the other ways are via external links to your site and internal links.

Internal links adds context to what the page is about via anchor text and what the content surrounding the link is signals google use to help determine what the page is about. 

While a page can rank if it has no internal links, it will be much more difficult. A quick check you can do it to make sure all the URLs in the sitemap are found via a crawl of the site.

Google is crawling every URL at least once per month*

The reason for the asterisks is if you have a very large site it might not be feasible for Google to crawl every page at least once a month, especially if you are a news site and the article is out of date – it might not make sense for Google to crawl it.

However the majority of sites you want to make sure Google is crawling every page at least once a month.

A quick check to do is download your log data and filter on Google bot and then do a count by URL. I now do this on both desktop and mobile bot but depending on your site might not be necessary. 

Summary

Whichever method you decide to download your sitemap, it’s crucial you do and that you start to analyse the URLs and making sure they are correct.

Leave a Comment

Your email address will not be published. Required fields are marked *