An automated way to scrape data within Google Sheets seems too good to be true? In this article, I will show you how and go through some examples of how I’ve used this in my role as a Paid Media Marketer.
On Google’s support page they say this about IMPORTXML:
“IMPORTXML can import data from various structured data types including XML, CSV, HTML TSV, and RSS and ATOM XML feed.”
Effectively IMPORTXML is a way of scraping structured data from a web page without needing any HTML, CSS or coding knowledge.
IMPORTXML only requires two variables to extract data from a web page.
The XPath is used to navigate through elements and attributes in a web document and is structured like this: //title.
For example, if we wanted to extract the title of this article: https://benjamindavis.uk/articles/voicesearch.html
We would use IMPORTXML like this:=IMPORTXML(“https://benjamindavis.uk/articles/voicesearch.html”, “//title”)
Bear in mind that if you copy the formula above to test it out you may need to re-write the speech marks.
If you want to find out how to extract specific elements that aren’t the page title if you go into Inspect Element on a web page and right-click the element you want to extract you can click copy > XPath which can then be used in the Google Sheets formula.
If you had a list of URLs of articles on your website and you wanted all their meta descriptions you could input all the URLs in a column in Google sheets. Then in the next column, you can use =IMPORTXML(URLCell, “//meta[@name=’description’]/@content”) which will then result in a list of your meta descriptions.
Example:
IMPORTHTML looks for a specific table or list in a webpage and copies the data out of it. In HTML this is anything inside <table
, <ul>
or <ol>
tags.
IMPORTHTML requires 3 variables:
=IMPORTHTML(URL, query_type, index)
A webpage will most likely contain one or more tables and/or lists. If you want to find out the indexes of them on a page, follow these steps:
var index = 1; [].forEach.call(document.getElementsByTagName("table"), function(elements) { console.log("Index: " + index++, elements); });
var index = 1; [].forEach.call(document.querySelectorAll("ul,ol"), function(elements) { console.log("Index: " + index++, elements); });
=IMPORTHTML("https://en.wikipedia.org/wiki/Wikipedia:About", "List",12)
In this example I will be pulling in this crypto pricing data from https://coinstats.app/coins/.
First I will use the developer console to get the index of this table. In this example it is just index 1.
I can then use the URL and Index to pull this table into Google Sheets using this formula:
=IMPORTHTML("https://coinstats.app/coins/", "Table",1)
In Google Sheets this looks like:
In the beginning Google limited you to only being able to have 50 IMPORTHTML functions per sheet but in 2015 this limit was removed.
However, with a lot of IMPORTHTML formulas you may notice a drop in speed as each request has to individually crawl an external URL.
If your formula suddenly stops working it is worth checking if the URL you are fetching from has changed failing that check if the index of the Table or List you are fetching from has changed.
If things are still not fixed perhaps the website owner has blocked website crawlers in their robots.txt file.
I have used both these functions to pull in a list of articles on a clients website and then use that data to then pull in their meta titles, descriptions and the category of the article.
This gave me a list of all article URLS and Category which I then used to create a Dynamic Search Ad feed with category labels which I used in Google Search Ads.
The Meta titles and descriptions just gave me context as to what the articles were about.
Another potential use of these formulas is to fetch data from an external source. Similar to my example above using Coinstats cryptocurrency chart if there was some data that you wanted to pull in to then operate on these formulas allow you to do that.
If the above methods don’t work for the data you are trying to pull in Google Sheets also has the following formula that allow you to access different data formats.
IMPORTXML
IMPORTRANGE
IMPORTFEED
IMPORTDATA
I have used both these functions to pull in a list of articles on a clients website and then use that data to then pull in their meta titles, descriptions and the category of the article.
This gave me a list of all article URLS and Category which I then used to create a Dynamic Search Ad feed with category labels which I used in Google Search Ads.
The Meta titles and descriptions just gave me context as to what the articles were about.
Another potential use of these formulas is to fetch data from an external source. Similar to my example above using Coinstats cryptocurrency chart if there was some data that you wanted to pull in to then operate on these formulas allow you to do that.
A marketing strategy is a single comprehensive plan which contains all of a company’s marketing goals and objectives.
Read MoreKPIs ensure that you execute a more successful marketing strategy as well as helping you to identify which campaigns and tactics are having the biggest impact for your business.
Read MoreStrategies that you can use in this new era of cookie-less, privacy focused browsers.
Read More