Introduction
This is for you if you have zero to beginner-level knowledge in the world of Data, specifically SQL.
Perhaps you have come across that term(SQL) multiple times, and might have been a bit curious, however, like with many things on this earth, it came with a lot of baggage: maybe you watched a video or came across an article, and started to hear and see strange things.
Things like PostgresSQL, TSQL, RDBMS etc. And you thought to yourself, this is another one of those programming languages reserved for those with glasses and energy drinks. Well, the purpose of this is to “defang” SQL, to bridge the gap from zero knowledge to a basic understanding and hopefully pique your interest to study it a bit further.
SQL is actually not too difficult to pick up, if you have done any sort of data manipulation or used formulas in Microsoft Excel, then you have enough expertise to use SQL. In fact, you have already been using some SQL functions without realizing it.
For this lesson, we’ll be focusing on the basic SQL words (actually called SQL clauses), used for retrieval of data i.e. SELECT, FROM, WHERE, ORDER BY, and some basic functions e.g. count.
We won’t be covering joins in this lesson.
Resources
The following links and files would be helpful if you would like to follow along.
- Why Learn SQL https://www.dataquest.io/blog/why-sql-is-the-most-important-language-to-learn/ | https://www.wealest.com/articles/lindy-effect
- Recording: Set up Google Big Query https://www.loom.com/share/5d3f1186133445389b6efa182a597aa5
About Me
- BSc. Electrical Engineering
- ~ 5 years of Experience in various Data Related roles.
- Interests include Reading, Running, Music (Bass Guitar), Teaching.
What is SQL
SQL stands for Structured Query Language. As the name implies, it is a language, and languages are used for communication to exchange ideas and information from one mind to the other.
SQL is the language for large datasets, it can be used to talk to large amounts of data. To retrieve, change, give or withhold access to the data.
Why are there sometimes different prefixes to SQL ?
(e.g. TSQL, PostgreSQL, OracleSQL, MYSQL etc.)
Well, those different prefixes are used to identify different “dialects” of SQL. Let us stick with the language analogy a bit further.
There exists a language called the English language, there are general rules of grammar, punctuation, and sentence construction that govern the language across the world. However, in different parts of the world, there are some differences in how the language is spoken and used e.g. British English, American English, Australian English etc. The general rules are the same and if you learn English in any one of those countries, you can generally easily understand or pick up the English spoken in another country.
Similarly, SQL is like the English language. Various organizations (like countries) adopt the language in a way that is comfortable to them. Hence Microsoft’s dialect of the SQL language is TSQL, Oracle’s is OracleSQL etc. They are very similar, but just like you would say “chips” and “booth” in England but “fries” and “trunk” in USA. Similarly in TSQL you would say “SELECT TOP …..”, and in PostgresSQL you would say “LIMIT” – however, in both cases you are communicating the same thing, get me only these number of rows of data (say the first 1000 rows).
SQL Platforms or RDBMS
Have you ever seen a job posting and seen different tools and technologies?
For example, some jobs might require you to be an “expert in Google Big Query” e.g. https://www.linkedin.com/jobs/search/?currentJobId=3537938588&f_E=2%2C3&geoId=101174742&keywords=Data Analyst&location=Canada&refresh=true&start=50
Some might require you to have experience with ‘Microsoft SQL Server’ e.g. https://www.linkedin.com/jobs/search/?currentJobId=3544447417&f_E=2%2C3&geoId=101174742&keywords=Data Analyst&location=Canada&refresh=true&start=25
These are various platforms, do not be overwhelmed by the various ones out there and think you need to have used them all when applying for jobs in the data field. A good way to think about this is to see SQL as email technology, and see the different platforms as the different email platforms out there. E.g. Gmail, Outlook, YahooMail etc. There are some differences in their interfaces, but the basic concept is the same. Once you use one, you can typically easily use another without too much of a learning curve.
Similarly, once you use one SQL platform, the interface of another might be slightly different, but have the confidence that you can pick it up quickly and feel free to radiate this confidence in your interviews.
Your First SQL Query(Bridging the Gap from Excel to SQL)
It’s easier than you think. Keep in mind that we’ve spoken about SQL as the language used to talk to large datasets, and if you think about it, data is everywhere. Actually, most of what you will be doing with SQL( if you choose to further your learning) are things you have been doing already.
There are different aspects of the SQL language, however, here, we would focus on the aspects of SQL that deal with retrieving data from a table.
For these examples, we’ll journey from the ubiquitous to the common. From the known to the unknown.
Example 1:
Feel free to download this very basic Excel file and follow along

Here is a simple Excel Table (Let us assume this table is called product_Table).
SQL is just like Excel on steroids, you can have multiple tables with millions of rows, you can link and relate those tables to each other.
If you have this table in a database and want to see the full table using a SQL Statement, it is as simple as what you see in the screenshot below.

Those two lines of text are essentially saying: get me all the columns from that table(product_Table).
Hiding and Showing Columns is essentially a SQL SELECT statement.
If you have ever had an excel file with some columns you don’t need and you hide that column. That is essentially a SQL SELECT Statement.
For example: if you hide the “Online” column in our Excel Table, that is the equivalent of choosing the specific columns you want to return from your table.
Have you ever sorted your data in Excel? That is an ORDER BY clause
What if we wanted to see the products in our tables from the one with the highest sales to the lowest one. You can sort with a few clicks in EXCEL.

That Sorting is done with one line in SQL.
For more advanced Excel users: Something like a VLOOKUP is pretty similar to a JOIN in SQL.

Have you ever filtered an Excel file? That is a WHERE clause in SQL
Imagine you wanted to see just your Online orders and you filtered that column

That is a simple WHERE clause in SQL

From this, I hope you can see that Excel and SQL are cousins. And hopefully, now you’re more familiar with SQL (the more intimidating cousin who it turns out was not as scary as it seemed. )
Example 2:
Feel free to download this very basic Excel file and follow along:
The AI facial recognition software that was newly installed at the local grocery store just identified the last few people that walked in. For some reason, you were given this data and asked to help the manager answer some questions for the chance to win a full tank of gas.
Specifically, the manager would like to:
- Print out the names of the customers, she wants to know if those in the store at the moment are her regulars or if there are any new folks.
- Print out the name of the oldest person in the store at that moment, she is in a good mood and would like to offer said person a “lifetime” 10 percent discount.
- Determine how many male shoppers are in the store, they might need more help 😀
Name | Gender | Age |
---|---|---|
Abigail | F | 20 |
Bond | M | 45 |
Carissa | F | 40 |
David | M | 19 |
Eleanor | F | 67 |
Femi | M | 36 |
Gisele | F | 29 |
Harold | M | 12 |
Imani | F | 15 |
Solution:
Let’s assume the table has a name (Table name = Customers), for easy identification.
Excel
These sound like simple enough tasks, a fair percentage of people are familiar with excel, if you were to do these tasks in excel, here is what they would look like:
- Hide the gender and age columns in the table. Print out the only visible column(i.e. the name column)

- There are many ways to go about this in excel, but one way would be to Activate the filters at the top of each column, sort Age from highest to lowest, now you can filter the age column by checking the highest number in that column, these returns the rows with the customers that have that age(in this case, that would be 67).

- For the number of male shoppers, you might filter on the Gender column by M, and count the number of rows. For a larger dataset, you might use the COUNT formula in excel. NB: In Excel ensure to copy out the values for ‘M’ to a different set of cells after filtering, and do the count on the numerical column. NB Yet Again: You can also use a countif formula to do this without needing the longer process outlined above. But the aim is to make this as simple as possible.

SQL
In SQL you can do the exact same things displayed above, however, you do them using typed commands as opposed to graphical interface point
In SQL, the results could look like the following:
- The line of code below, would do exactly the same thing as the excel example. It looks at the table and returns only the columns you need.
SELECT Name
FROM Customers
- The SQL query below would return the same results for the second query. It gets the maximum age, filters the table by the rows that contain that age, and then retrieves the specified columns in that Select statement
SELECT Name ,Gender ,Age FROM Customers WHERE Age = MAX(Age)
- To return the number of male customers, you could probably use a query like so:
SELECT COUNT(*) FROM Customers WHERE Gender = 'M'
Exercise
You have been hired by a billionaire who loves tea, he aims to find some of the best tea shops in California and buy them, he gives you the dataset below to work with.
GDAsf_boba_tea_shop_data – San Francisco Boba Tea Shop Location Info (1).csv
Answer the following questions using Excel and SQL.
- How many tea shops are in the dataset?
- How many tea shops have a rating of 4.5 or greater?
- How many tea shops have monthly sales greater than $10, 000, sort from the highest selling to lowest selling shop
- How many tea shops with a rating lower than 3 have avg sales greater than $8000
Data Analysis Perspective
Now, that you have answered those questions. It is good practice in industry not just to learn how to use tools, but to know how the utilization of those tools connect to business value. Essentially, how does what I am doing with these tools help the company save money, make money, save time, improve efficiencies etc.
The data analysis process from Google (https://www.coursera.org/learn/foundations-data/lecture/JjA1f/six-phases-of-data-analysis) gives me a good framework to aspire to when handling data requests in my day to day job.
Ask
Ask questions about what the person is trying to achieve, what decisions would be made based on the information? Ask questions about the data, where was it gotten from? Is it complete? Do I have enough questions to answer the question of the requester?
Prepare/Process
e.g.
- Create Data Backup
- Find Duplicates
- Missing Data- Look for nulls
- Ensure Data types are consistent. No strings in number columns
- Split or Concatenate Columns as needed
Analyze
Do the sorting, calculations, manipulations needed to answer the questions. What story is the data telling? Does it agree or contradict with your initial intuition?
Share
Present the data in a good looking clean visualization.
Act
Beyond pretty visuals, provide recommendations based on the results of your analysis.