Tables, Relationships, Diagrams…
If you have an exam tomorrow, or an important interview, or maybe you want to gain a general understanding of SQL before learning it formally, welcome to my (quick) practical summary!
Structurally, there are Clusters → Catalogues → Schemas → Tables and Views → Columns and Rows, in that order of hierarchy. Schemas and Tables are the most important to understand — in schemas, you can create a database where you will store your data in different tables (equivalent to a pythonic data frame). You can create the database, or use an existing one:
CREATE NewDatabase;USE ExistingDatabase;
In SQL, there are entities that are the subjects of interest of the tables, and they have attributes (columns) that describe them. Entities hold relationships between them and can form what is called a relational model, which is when we specify and use those relationships in a certain format. All attributes have domains (some examples are INT, DECIMAL, VARCHAR, CHAR, DATETIME…).
Tables are made of a primary key (also called candidate key) which has to be unique and non-null and will uniquely identify each row (also knows as entity integrity) and other non-key attributes. A primary key is nothing more than an ID column.
You can use foreign keys to merge two tables, and a foreign key is simply a column that is the primary key of another table. There are two main ways to create a table: creating the columns (with column types, and specifying the primary key)…
CREATE TABLE my_table_name(ID INT NOT NULL,
Description VARCHAR(255),
PRIMARY KEY(ID)
);# another way of specifying the primary key:CREATE TABLE my_table_name(ID INT NOT NULL PRIMARY KEY,
Description VARCHAR(255)
);
…or selecting columns from other tables using some conditions:
# simple select with a condition
CREATE TABLE my_table_name AS SELECT *
FROM another_table
WHERE column1 > 10 AND column2 < 20;# more complex select aggregating by column1 and taking the avg. of # column2, joining two tables using column1CREATE TABLE my_table_name AS SELECT column1, AVG(column2)
FROM another_table
INNER JOIN yet_another_table USING(column1)
GROUP BY column1;
Adding NOT NULL to the column means that the column cannot contain null values, therefore it is always mandatory to insert a value to it with new rows.
Altering an existing table, such as adding new columns, renaming them, or changing column values using the following methods:
# updating column values according to conditionUPDATE Table
SET column1 = 'Tourist'
WHERE column2 NOT('Home');# renaming a columnALTER Table RENAME COLUMN Column1 TO Status;# adding a column that is the multiplication of two other columnsALTER Table ADD COLUMN Column3 AS Price*Quantity;# remove a columnALTER Table DROP COLUMN Column1;# insert specific values into the tableINSERT table (column1, column2)
VALUES (value1, value2)
Every table should have referential integrity which means that every non-null foreign key needs to match a non-null primary key. For a column to be functionally dependent on the primary key, there needs to be a unique value of the primary key to each value of the column.
Querying Conditions
Knowing most of the querying conditions types can help a lot in your efficiency when using SQL. It is also important the know the differences (and similarities) between them:
WHERE is used to filter the table, HAVING is the same as WHERE but can be used in group by's. WHERE can also be used to join columns
--> WHERE c.column1 = c1.column1AND/OR : AND means both conditions need to be met, OR means only one (or the other) needs to be met.NOT (condition) : negates the condition, != can also be usedORDER BY : sort by the column specified. You can specify DESC and ASC. ASC (ascending order) is the default.GROUP BY : group by the column(s) specified LIKE: when you want to select rows that contain a certain value. LIKE 'German%' will select all values that start with German. LIKE 'German_' will select all values that start with German and have max one other character after it. DISTINCT(Column1) : selects all the unique valuesCOUNT(column1) : returns the number of rowsSUM(column1): will sum the colum values, if it's of type intAVG(column1) : will average column values, if it's of type intINNER JOIN, JOIN, NATURAL JOIN : join using all the columns, or you can specify the columns --> JOIN table USING(), or JOIN ON()LEFT JOIN/RIGHT JOIN : Join and display rows from the left/right even if no matches are made, filling in the nulls
Nested Queries
A very common exam question is to ask you to write a query that finds all results related to other query results. For example, what if we wanted to know all actors that worked in movies in which Angelina Jolie also acted in?
We would use the following Nested Query:
SELECT name FROM Actor WHERE actor.movie =
(SELECT movie_id FROM Movie WHERE Movie.actor_name = 'Angelina Jolie') AND name != 'Angelina Jolie';
This query would return all the actors that participated in moves that Angelina Jolie did, but were not Angelina Jolie.
Three types of relationships between entities, or tables: (1) many-to-many, (2) one-to-money, (3) one-to-one. One-to-one relationships are very rare, and many-to-many relationships often need to be transformed (in relational models) into a one-to-many by creating a new entity, also known as a table, to connect the two.
one-to-one: an airport only exists in one city, and a city only has one airport.
one-to-many: there are many books in a library, but all books come from that one library.
many-to-many: beaches can have many seagulls, but seagulls can travel to many beaches.
When building a diagram to represent all these relationships, it is common practice to use the IE (crow’s feet model). Here is a very intuitive explanation for it.
Normalization Forms
There are 6 normalization forms, but the first three are the most important:
1st form means the table needs to be a two-dimensional table and have no multivalued rows (rows with more than one value).
2nd form means it needs to fulfill the 1st requirements and all columns need to be functionally dependent on the primary key (or concatenated key) column.
To be in the 3rd form, it needs to be in 2nd form and have no transitive dependencies, meaning that the determinant has to be a primary key (you shouldn’t be able to determine other columns from a column that is not a primary key).
There are three types of anomalies that occur in the 1st and 2nd forms, that do not in the 3rd form (therefore anything in 3rd form or above is generally regarded as good database design): insertion anomaly, deletion anomaly, and modification anomalies. These anomalies generally mean that you are deleting or altering columns that you do not intend to while inserting new data. More about this topic here.
Data warehouses are a type of relational model that stores data long-term and has a slightly different design than operational databases. In data warehouses, we use a central table (called a Fact Table) with only foreign keys and data to be summarized and connect all the other tables to the central table (called dimension tables). It’s important to understand the use of data warehouses if you’re planning on doing any data analysis!
For example, if a store has data on its sales transactions, customers and employees, it might be very costly and inefficient to keep all its historical sales data (which may even reach the hundreds of thousands) in the same system the employees log new data every day. This data needs to be transferred (or dumped) somewhere else, where longer-term analysis and business intelligence can draw value from it, in a process called data mining. The data needs to go through data cleaning, where null values can be managed and relevant data can be filtered, also called the ETL process. Then, the end-user can query this data warehouse using a BI Tool (such as tableau).
In the earlier example, the store would have “facts” such as sales quantity and price in the fact table, as well as sales transaction id, and have other sales details (such as time of the transaction, or website used) in the Sales dimension table. Similarly, customer ids should be in the fact table, but customer-level data should be in the Customer dimension table.
Conclusion
Making your own summary is always the most ideal solution, but if you are short of time and need to refresh some concepts, this is guaranteed to bring out some SQL concepts from your memory. I hope you do well in your exam, interview, or just back to using SQL after some time away!
If you are completely new to SQL, don’t be scared! This is a very superficial summary of a large amount of content that is only meant to trigger already existing knowledge. However, I hope you remember what you read here when you are learning these concepts formally! It will certainly help to know in advance what a subject is all about.