The Ultimate Tool for Big Data Access Using Google Sheets

February 03, 2020

Google Sheets, Query Function

Imagine learning a function, which can work as various other functions; therefore, replacing them with only one and making work with big data easier. With the query function of Google Sheets that scenario is not that absurd because it provides exactly what described. Accessing, summarizing, filtering data from various spreadsheets with only one function - thanks to Google Sheets, the opportunity to do exactly that has been around for quite some time already.


The query function is built on a structured query language (SQL), which is known to be the first step of learning how to access and manipulate big data. Exactly this trait provides the function of being so powerful and allowing its users to work with big data over multiple spreadsheets with only minimal effort.

With that said, let us begin by understanding the syntax of the function.

The Syntax

To begin mastering the query function, we first need to understand the components of the syntax, where to write what to get the required outcome. So this is how the syntax of the Query function looks like:

Web Scraper Query Function Syntax Components

Web Scraper Query Function Syntax Components

Data - will be the reference to the data set that will be used for the Query;

What is important about this component is that the data within the selected column has to be the same type, meaning that if some cells will consist of only numbers, others of only letters, then Google will determine the type that is used for the majority of data in the specific column and return nothing or the null value for the data that does not match the majority data type.

Query - the SQL part of the function, where the use of clauses is applied for explaining what is needed to be retrieved form applying the function;

For this syntax component the most important thing to remember, aside from the clauses, which will be explained later in the blog, is that the value of the query syntax component has to be enclosed within quotation marks, or else Google will not be able to recognize the criteria.

Header - to specify the number of rows that are headers in your chosen data;

This component is optional; however, if left blank or set to “-1” Google will make a guess at the headers within your data.

Having acquainted with the syntax and what is required for the function to work, the part where the most learning is necessary begins - the clauses. Clauses are what creates the unique proposition value of the query function being the most powerful function there is.

The Clauses

For a better understanding, we have prepared an example Google Sheet of ESPN World Fame 2019 list which you can access and follow the tutorial blog clicking the link below: https://docs.google.com/spreadsheets/d/1oMcWRLmbTmu2Mjmofqo7hgigxfsq7e_vFbLZtFvGOTc/edit?usp=sharing

Web Scraper Query Function Blog Introduction Data Table

Web Scraper Query Function Blog Introduction Data Table

SELECT clause

(selects which columns to return and in what order)

SELECT Clause Query Function Blog Web Scraper

SELECT Clause Query Function Blog Web Scraper

DATA!A:I - indicate that we are taking the data from another spreadsheet, more specifically - the “Data” spreadsheet, from the columns of A to I. (And will be applied for the rest of the examples)

"SELECT C,B,D" - indicates that we need the query function to return the first name (SELECT C), the last name (B), and the sport (D) corresponding to the specific person, writing them in such order (C, B, D) will return the chosen columns in such order.

1 - explains that the first row values will be skipped, but used as the headers. (And will be applied for the rest of the examples)

The retrieved data from the written query function will look like this:

Web Scraper Query Function Blog Select Clause Retrieved Data

Web Scraper Query Function Blog Select Clause Retrieved Data

WHERE clause

(returns only the rows that match a specific condition)

+

ORDER BY clause 

(sorts the order of the retrieved data)

Web Scraper Query Function Blog Where and Order By Clause

Web Scraper Query Function Blog Where and Order By Clause

Indicates to return solely persons last name (SELECT B) and their endorsement amount (F) if their endorsement number is higher than the average number (WHERE F > 12.6), and order the data in descending order by the endorsement amount (ORDER BY F DESC).

The retrieved data from the written query function will look like this:

Web Scraper Query Function Blog Where  and Order By Clause Retrieved Data

Web Scraper Query Function Blog Where and Order By Clause Retrieved Data

GROUP BY clause 

(aggregates values across rows to summarize data into groups, similarly as a pivot table does)

Web Scraper Query Function Group By Clause

Web Scraper Query Function Group By Clause

Retrieves a list of countries (SELECT E) but grouped by only unique values (GROUP BY E), counted how many times the country has been mentioned in the “Country” column (COUNT(E)), ignoring the rows where the “Country” column is empty (WHERE E IS NOT NULL), and ordered in descending order by the count (ORDER BY COUNT(E) DESC)

The retrieved data from the written query function will look like this:

Web Scraper Query Function Group By Clause Retrieved Data

Web Scraper Query Function Group By Clause Retrieved Data

LIMIT clause

(limits the number of retrieved rows)

Web Scraper Query Function Blog Limit Clause

Web Scraper Query Function Blog Limit Clause

Retrieves only 10 rows (LIMIT 10) of the first names (SELECT C), the last names (B), and the number of the social following (H) in descending order by the amount of social following (ORDER BY H DESC).

The retrieved data from the written query function will look like this:

Web Scraper Query Function Limit Clause Retrieved Data

Web Scraper Query Function Limit Clause Retrieved Data

PIVOT clause

(summarizes unique values of a column responding to unique values of another column, creating an XY-type table)

Web Scraper Query Function Pivot Clause

Web Scraper Query Function Pivot Clause

Creates an XY-type table showing the number of representatives of a sport (D), ignoring cells that are empty (WHERE D S NOT NULL), and a country (E).

The retrieved data from the written query function will look like this:

Web Scraper Query Function Blog Pivot Clause Retrieved Data

Web Scraper Query Function Blog Pivot Clause Retrieved Data

CONTAINS clause

(retrieves only values that match the specification of the clause)

Web Scraper Query Function Contains Clause

Web Scraper Query Function Contains Clause

Retrieves first name (SELECT C), last name (B) of only golf players (WHERE D CONTAINS ‘Golf’ ”).

The retrieved data from the written query function will look like this:

Web Scraper Query Function Blog Contains Clause Retrieved Data

Web Scraper Query Function Blog Contains Clause Retrieved Data

The clauses shown and explained are the most popular and one of the most basic ways of using the query function. For more complex work, other SQL clauses can be applied and combined for different outcomes.

All said and done, the query function is truly like a portal to accessing big data. The easier way to not only access and search through data over multiple spreadsheets but also creates the possibilities of doing things that are not possible or easy to do using filtering, pivoting, sum, sumif, and other functions.






Go back to blog page