Muchang Bahng | Duke Math

Structured Query Language (SQL)

Contents

Basic Topics

Data is either stored in a relational or non-relational database.

A relational database can be naively thought of as data organized into tables. In here, the relationship between tables and rows is easy to structure, and this structure is called a schema. We can see that the schema links the names/tags of the dogs from one table to another.
A non-relational database is everything else: data that is not really meant to be stored in tables. Their storage model is optimized for the type of data it is storing. They can store documents, key-value data, or graphs. e.g. MongoDB

SQL is the programming language that is used to work with data from relational databases. The editor that we use varies, but we will use MySQL. As a recommended practice, we will capitalize all keywords.

Basic Topics

[Hide]

When we first connect to either our localhost or a remote server (must be authenticated by a password), we first select a database to work with, and within that database, there are tables.

CREATE DATABASE record_company;   -- creates database 

USE record_company;               -- selects database

DROP DATABASE record_company;     -- deletes database 

STATUS;       -- outputs current status, including current database

Similarly, we can create tables, alter tables, and drop tables.

CREATE TABLE test (     -- creates table
  test_column INT       -- one column
); 

ALTER TABLE test      
ADD another_column VARCHAR(255); 

DROP TABLE test;

Now let's construct relations between our tables. We create a bands table and an album table. The NOT NULL indicates that the column elements cannot be a null value. The AUTO_INCREMENT automatically increments the id values as we add more rows. Finally, we would like to use the id column as the primary key (which is the primary unique identifier of each row) for each band, since there may be different bands with the same names.

CREATE TABLE bands (                -- Creates table bands
  id INT NOT NULL AUTO_INCREMENT,   -- column 'id' of type int
  name VARCHAR(255) NOT NULL,       -- column 'name' of type string w/ max length 255
  PRIMARY KEY (id)
);                                  -- semicolon to end

CREATE TABLE albums (
  id INT NOT NULL AUTO_INCREMENT, 
  name VARCHAR(255) NOT NULL, 
  release_year INT, 
  band_id INT NOT NULL, 
  PRIMARY KEY (id), 
  FOREIGN KEY (band_id) REFERENCES bands(id)
)

The foreign key reference line says that the band_id column in the albums table references the id column in the bands table. Having this can be good for error catching, since SQL will now not let us create an album that references a nonexistent band. Furthermore, if we try to delete a band that has albums linking to that band, then SQL will not let us, since the references must exist. This keeps a good control flow.

Adding, Modifying, and Deleting Data

We can now insert data into our tables using the INSERT INTO keyword. Note that for the bands table, we only need to input the name since the id is auto-generated from the AUTO_INCREMENT keyword.

INSERT INTO bands (name) 
VALUES ('Deuce'), ('Avenged Sevenfold'), ('Ankor'), ('Iron Maiden'); 

INSERT INTO albums (name, release_year, band_id) 
VALUES ('The Number of the Beasts', 1985, 1), 
       ('Power Slave', 1984, 1), 
       ('Nightmare', 2018, 2), 
       ('Nightmare', 2010, 3), 
       ('Test Album', NULL, 3);

To delete the test album row, we can take advantage of the WHERE keyword, explained below.

DELETE FROM albums 
WHERE id = 5; 

SELECT * FROM albums; 
+----+--------------------------+--------------+---------+
| id | name                     | release_year | band_id |
+----+--------------------------+--------------+---------+
|  1 | The Number of the Beasts |         1982 |       1 |
|  2 | Power Slave              |         1984 |       1 |
|  3 | Nightmare                |         2018 |       2 |
|  4 | Nightmare                |         2010 |       3 |
+----+--------------------------+--------------+---------+

Querying

We can use the SELECT keyword to query data from a table. It is of the syntax SELECT 'columns' FROM 'table'. If we put a *, then we are asking for all the columns.

SELECT * FROM bands;          # retrieve the rows with all column data
+----+-------------------+
| id | name              |
+----+-------------------+
|  1 | Iron Maiden       |
|  2 | Deuce             |
|  3 | Avenged Sevenfold |
|  4 | Ankor             |
+----+-------------------+
4 rows in set (0.03 sec)

SELECT * FROM bands LIMIT 2;  # limits the retrieval to 1st two rows 
+----+-------------+
| id | name        |
+----+-------------+
|  1 | Iron Maiden |
|  2 | Deuce       |
+----+-------------+
2 rows in set (0.00 sec)

We can also retrieve specific columns and alias them with a different name.

SELECT name FROM bands;       # retrieve rows with the 'name' column data
+-------------------+
| name              |
+-------------------+
| Iron Maiden       |
| Deuce             |
| Avenged Sevenfold |
| Ankor             |
+-------------------+ 

SELECT id AS 'ID', name as 'Band Name' FROM bands; 
+----+-------------------+
| ID | Band Name         |
+----+-------------------+
|  1 | Iron Maiden       |
|  2 | Deuce             |
|  3 | Avenged Sevenfold |
|  4 | Ankor             |
+----+-------------------+
4 rows in set (0.00 sec)

We can order them by different columns using the ORDER BY keyword, along with an additional parameter determining whether it is ascending (ASC, by default) or descending (DESC).

SELECT * FROM bands ORDER BY name; 
+----+-------------------+
| id | name              |
+----+-------------------+
|  4 | Ankor             |
|  3 | Avenged Sevenfold |
|  2 | Deuce             |
|  1 | Iron Maiden       |
+----+-------------------+
4 rows in set (0.01 sec)

SELECT * FROM bands ORDER BY name DESC;
+----+-------------------+
| id | name              |
+----+-------------------+
|  1 | Iron Maiden       |
|  2 | Deuce             |
|  3 | Avenged Sevenfold |
|  4 | Ankor             |
+----+-------------------+
4 rows in set (0.00 sec)

Notice that when we query all the data from the albums table, we get two 'Nightmare's. If we would like to get unique names, then we simply use the DISTINCT keyword.

SELECT name FROM albums; 
+--------------------------+
| name                     |
+--------------------------+
| The Number of the Beasts |
| Power Slave              |
| Nightmare                |
| Nightmare                |
+--------------------------+
4 rows in set (0.00 sec)

SELECT DISTINCT name FROM albums; 
+--------------------------+
| name                     |
+--------------------------+
| The Number of the Beasts |
| Power Slave              |
| Nightmare                |
+--------------------------+
3 rows in set (0.00 sec)

Where Statement

We can modify the rows of the table with the UPDATE keyword. This first queries the relevant data and then updates the columns to whatever we set it as. Below, since we are selecting the entire albums table, it will update every release_year to 1982, which is not what we want.

UPDATE albums            -- updates ALL release_year
SET release_year = 1982; -- to 1982

Therefore, we would like to have a more sophisticated filter that queries data that meet certain requirements. We have the WHERE keyword for that, which can be placed at the end of any query statement to filter the actions into a specified subset of the table. If we want to do it for a single row, then we can take advantage of the uniqueness of the primary key.

UPDATE albums 
SET release_year = 1982 
WHERE id = 1;

Here are some additional ways you can use the WHERE keyword.

Select all columns of all rows where the release year is less than 2000.
```
SELECT * FROM albums
WHERE release_year < 2000; 
```
Select all columns of all rows where the name has an 'er' in it somewhere, indicated by wildcard symbols % (can be a string of any length) or _ (string of length 1), or the band id is 2.
```
SELECT * FROM albums 
WHERE name LIKE '%er%' OR band_id = 2; 
```
Select all columns of all rows where the release year is 1984 and the band id is 1.
```
SELECT * FROM albums 
WHERE release_year = 1984 AND band_id = 1; 
```
Select all columns of all rows where the release year is between 2000 and 2018.
```
SELECT * FROM albums 
WHERE release_year BETWEEN 2000 and 2018; 
```
Select all columns of all rows where release year is null.
```
SELECT * FROM albums 
WHERE release_year IS NULL; 
```
Select the band id column of all rows where the release year in (1982, 1984).
```
  SELECT band_id FROM albums 
  WHERE release_year IN (1982, 1984); 
```

Select column names of all rows where the column name element is in a query.

SELECT column_name(s) FROM table_name 
WHERE column_name IN (SELECT STATEMENT);

Min, Max, Count, Avg, Sum Functions

We can find the minimum of a column with the syntax.

SELECT MIN(column_name) FROM table_name
WHERE condition;

Similarly with maximum We can find the minimum of a column with the syntax.

SELECT MAX(column_name) FROM table_name
WHERE condition;

The count function returns the number of rows that matches a specified criterion.

SELECT COUNT(column_name) FROM table_name
WHERE condition;

The average returns the average.

SELECT AVG(column_name) FROM table_name
WHERE condition;

The sum returns the sum.

SELECT SUM(column_name) FROM table_name
WHERE condition;

Joining Tables

When we join tables, we can take advantage of the foreign referneces to attach relevant data from another table to the original table. Below, we select the rows with all columns from the bands table, and then join the albums table to the bands table in the following way: if the band id in bands is equal to the band id in the albums, then just attach that album row to the band row. Note that the table on the left is the table you write first in your command, and the one on the right is the second table you write.

The INNER JOIN keyword selects records that have matching values in both tables. Its syntax is

SELECT column_name(s) FROM table1 
INNER JOIN table2 
ON table1.column_name = table2.column_name;

Notice that Ankor is not shown below in the inner join, since it must be included in both tables.

SELECT * FROM bands 
INNER JOIN albums ON bands.id = albums.band_id; 
+----+-------------------+----+--------------------------+--------------+---------+
| id | name              | id | name                     | release_year | band_id |
+----+-------------------+----+--------------------------+--------------+---------+
|  1 | Iron Maiden       |  1 | The Number of the Beasts |         1982 |       1 |
|  1 | Iron Maiden       |  2 | Power Slave              |         1984 |       1 |
|  2 | Deuce             |  3 | Nightmare                |         2018 |       2 |
|  3 | Avenged Sevenfold |  4 | Nightmare                |         2010 |       3 |
+----+-------------------+----+--------------------------+--------------+---------+
4 rows in set (0.01 sec)

The LEFT JOIN keyword returns all records from the left table (table1) and the matching records from the right table (table2). If there are no matches in table2 for a row in table1, it attaches a NULL row.

SELECT column_name(s) FROM table1 
LEFT JOIN table2 
ON table1.column_name = table2.column_name;

Notice that Ankor is now shown, since every row of the first table is shown.

SELECT * FROM bands 
LEFT JOIN albums ON bands.id = albums.band_id; 
+----+-------------------+------+--------------------------+--------------+---------+
| id | name              | id   | name                     | release_year | band_id |
+----+-------------------+------+--------------------------+--------------+---------+
|  1 | Iron Maiden       |    2 | Power Slave              |         1984 |       1 |
|  1 | Iron Maiden       |    1 | The Number of the Beasts |         1982 |       1 |
|  2 | Deuce             |    3 | Nightmare                |         2018 |       2 |
|  3 | Avenged Sevenfold |    4 | Nightmare                |         2010 |       3 |
|  4 | Ankor             | NULL | NULL                     |         NULL |    NULL |
+----+-------------------+------+--------------------------+--------------+---------+
5 rows in set (0.00 sec)

The RIGHT JOIN returns all records from the right table (table2), and the matching records from the left table (table1). If there are no matches in table1 for a row in table2, it attaches a NULL row.
```
SELECT column_name(s) FROM table1 
LEFT JOIN table2 
ON table1.column_name = table2.column_name;
```

FULL JOIN returns all records when there is a match in left (table1) or right (table2) table records.

SELECT column_name(s) FROM table1 
FULL OUTER JOIN table2 
ON table1.column_name = table2.column_name 
WHERE condition;

Group By

The GROUP BY keyword groups rows that have the same values into summary rows. They are often used with the functions COUNT(), MAX(), MIN(), SUM(), and AVG(). Its syntax is

SELECT column_name(s) FROM table_name 
WHERE condition 
GROUP BY column_name(s) 
ORDER BY column_name(s);

Let's say that we have the following Customer table:

SELECT * FROM Customer 
+-------------+------------+-----------+------+---------+
| customer_id | first_name | last_name | age  | country |
+-------------+------------+-----------+------+---------+
|           1 | John       | Doe       |   31 | USA     |
|           2 | Robert     | Luna      |   22 | USA     |
|           3 | David      | Robinson  |   22 | UK      |
|           4 | John       | Reinhardt |   25 | UK      |
|           5 | Betty      | Doe       |   28 | UAE     |
+-------------+------------+-----------+------+---------+
5 rows in set (0.00 sec)

Our goal is to find the number of people from each country in the table. We would like our output to say that there are 2 people from the USA, 2 from the UK, and 1 from UAE.

SELECT country, COUNT(*) from Customer 
GROUP BY country 
+---------+----------+
| country | COUNT(*) |
+---------+----------+
| USA     |        2 |
| UK      |        2 |
| UAE     |        1 |
+---------+----------+

It first takes the original table, groups the rows by country. The COUNT(*) function counts all the rows in each group, and we finally output the two-column table with columns country and the COUNT(*).

Given the following orders table

SELECT * FROM orders
+----------+----------+--------+-------------+
| order_id | item     | amount | customer_id |
+----------+----------+--------+-------------+
|        1 | Keyboard |    400 |           4 |
|        2 | Mouse    |    300 |           4 |
|        3 | Monitor  |  12000 |           3 |
|        4 | Keybord  |    400 |           1 |
|        5 | Mousepad |    250 |           2 |
+----------+----------+--------+-------------+
5 rows in set (0.00 sec)

We can find the total sum of all expenses spent per customer with the following command. It groups the rows by customer_id, sums up the amount for each customer_id, and finally outputs a table with each customer_id and

amount
SELECT customer_id, SUM(amount) AS total 
FROM orders 
GROUP BY customer_id; 
+-------------+-------+
| customer_id | total |
+-------------+-------+
|           4 |   700 |
|           3 | 12000 |
|           1 |   400 |
|           2 |   250 |
+-------------+-------+
4 rows in set (0.00 sec)




Intermediate Topics
[Hide]