- Most of the examples are from https://pgexercises.com/
- Most of the notes are from https://www.postgresql.org/docs/12
Aggregates are also very useful in combination with GROUP BY clauses. For example, we can get the maximum low temperature observed in each city with:
SELECT city, max(temp_lo) FROM weather GROUP BY city; city | max ---------------+----- Hayward | 37 San Francisco | 46 (2 rows)
which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these grouped rows using HAVING:
SELECT city, max(temp_lo) FROM weather GROUP BY city HAVING max(temp_lo) < 40; city | max ---------+----- Hayward | 37 (1 row)
which gives us the same results for only the cities that have all temp_lo values below 40. Finally, if we only care about cities whose names begin with “S”, we might do:
SELECT city, max(temp_lo) FROM weather WHERE city LIKE 'S%' -- (1) GROUP BY city HAVING max(temp_lo) < 40;
It is important to understand the interaction between aggregates and SQL’s WHERE and HAVING clauses. The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the HAVING clause always contains aggregate functions. (Strictly speaking, you are allowed to write a HAVING clause that doesn’t use aggregates, but it’s seldom useful. The same condition could be used more efficiently at the WHERE stage.)
Aggregate Function List
- Aggregate functions for statistics
- Full aggregate function list
A window function performs a calculation across a set of table rows that are somehow related to the current row.
This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary; depname | empno | salary | avg -----------+-------+--------+----------------------- develop | 11 | 5200 | 5020.0000000000000000 develop | 7 | 4200 | 5020.0000000000000000 develop | 9 | 4500 | 5020.0000000000000000 develop | 8 | 6000 | 5020.0000000000000000 develop | 10 | 5200 | 5020.0000000000000000 personnel | 5 | 3500 | 3700.0000000000000000 personnel | 2 | 3900 | 3700.0000000000000000 sales | 3 | 4800 | 4866.6666666666666667 sales | 1 | 5000 | 4866.6666666666666667 sales | 4 | 4800 | 4866.6666666666666667 (10 rows)
The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)
A window function call always contains an OVER clause directly following the window function’s name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.
You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:
SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary; depname | empno | salary | rank -----------+-------+--------+------ develop | 8 | 6000 | 1 develop | 10 | 5200 | 2 develop | 11 | 5200 | 2 develop | 9 | 4500 | 4 develop | 7 | 4200 | 5 personnel | 2 | 3900 | 1 personnel | 5 | 3500 | 2 sales | 1 | 5000 | 1 sales | 4 | 4800 | 2 sales | 3 | 4800 | 2 (10 rows)
As shown here, the rank function produces a numerical rank for each distinct ORDER BY value in the current row’s partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.
The rows considered by a window function are those of the “virtual table” produced by the query’s FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.
We already saw that ORDER BY can be omitted if the ordering of rows is not important. It is also possible to omit PARTITION BY, in which case there is a single partition containing all rows.
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.  Here is an example using sum:
SELECT salary, sum(salary) OVER () FROM empsalary; salary | sum --------+------- 5200 | 47100 5000 | 47100 3500 | 47100 4800 | 47100 3900 | 47100 4200 | 47100 4500 | 47100 4800 | 47100 6000 | 47100 5200 | 47100 (10 rows)
Above, since there is no ORDER BY in the OVER clause, the window frame is the same as the partition, which for lack of PARTITION BY is the whole table; in other words each sum is taken over the whole table and so we get the same result for each output row. But if we add an ORDER BY clause, we get very different results:
SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; salary | sum --------+------- 3500 | 3500 3900 | 7400 4200 | 11600 4500 | 16100 4800 | 25700 4800 | 25700 5000 | 30700 5200 | 41100 5200 | 41100 6000 | 47100 (10 rows)
Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates of the current one (notice the results for the duplicated salaries).
Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa.
If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select. For example:
SELECT depname, empno, salary, enroll_date FROM (SELECT depname, empno, salary, enroll_date, rank() OVER (PARTITION BY depname ORDER BY salary DESC, empno) AS pos FROM empsalary ) AS ss WHERE pos < 3;
The above query only shows the rows from the inner query having rank less than 3.
When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:
SELECT sum(salary) OVER w, avg(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
Window Function List
Window functions provide the ability to perform calculations across sets of rows that are related to the current query row.
In addition to these functions, any built-in or user-defined general-purpose or statistical aggregate (i.e., not ordered-set or hypothetical-set aggregates) can be used as a window function. Aggregate functions act as window functions only when an OVER clause follows the call; otherwise they act as non-window aggregates and return a single row for the entire set.
- Full window function list
You can update existing rows using the UPDATE command. Suppose you discover the temperature readings are all off by 2 degrees after November 28. You can correct the data as follows:
UPDATE weather SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2 WHERE date > '1994-11-28';
You can list the columns in a different order if you wish or even omit some columns, e.g., if the precipitation is unknown:
INSERT INTO weather (date, city, temp_hi, temp_lo) VALUES ('1994-11-29', 'Hayward', 54, 37);
You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table:
CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name; SELECT * FROM myview;
Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces.
Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon.
WITH provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a WITH clause can be a SELECT, INSERT, UPDATE, or DELETE; and the WITH clause itself is attached to a primary statement that can also be a SELECT, INSERT, UPDATE, or DELETE.
The basic value of SELECT in WITH is to break down complicated queries into simpler parts. An example is:
WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales) ) SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_sales FROM orders WHERE region IN (SELECT region FROM top_regions) GROUP BY region, product;
which displays per-product sales totals in only the top sales regions. The WITH clause defines two auxiliary statements named regional_sales and top_regions, where the output of regional_sales is used in top_regions and the output of top_regions is used in the primary SELECT query. This example could have been written without WITH, but we’d have needed two levels of nested sub-SELECTs. It’s a bit easier to follow this way.
The optional RECURSIVE modifier changes WITH from a mere syntactic convenience into a feature that accomplishes things not otherwise possible in standard SQL. Using RECURSIVE, a WITH query can refer to its own output. A very simple example is this query to sum the integers from 1 through 100:
WITH RECURSIVE t(n) AS ( VALUES (1) UNION ALL SELECT n+1 FROM t WHERE n < 100 ) SELECT sum(n) FROM t;
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query’s own output.
Consider the following problem: You want to make sure that no one can insert rows in the weather table that do not have a matching entry in the cities table. This is called maintaining the referential integrity of your data. In simplistic database systems this would be implemented (if at all) by first looking at the cities table to check if a matching record exists, and then inserting or rejecting the new weather records. This approach has a number of problems and is very inconvenient, so PostgreSQL can do this for you.
The new declaration of the tables would look like this:
CREATE TABLE cities ( city varchar(80) primary key, location point ); CREATE TABLE weather ( city varchar(80) references cities(city), temp_lo int, temp_hi int, prcp real, date date );
Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all.
In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN and COMMIT commands. So our banking transaction would actually look like:
BEGIN; UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; -- etc etc COMMIT;
If, partway through the transaction, we decide we do not want to commit (perhaps we just noticed that Alice’s balance went negative), we can issue the command ROLLBACK instead of COMMIT, and all our updates so far will be canceled.
Suppose we have a table similar to this:
CREATE TABLE test1 ( id integer, content varchar );
and the application issues many queries of the form:
SELECT content FROM test1 WHERE id = constant; With no advance preparation, the system would have to scan the entire test1 table, row by row, to find all matching entries. If there are many rows in test1 and only a few rows (perhaps zero or one) that would be returned by such a query, this is clearly an inefficient method. But if the system has been instructed to maintain an index on the id column, it can use a more efficient method for locating matching rows. For instance, it might only have to walk a few levels deep into a search tree.
A similar approach is used in most non-fiction books: terms and concepts that are frequently looked up by readers are collected in an alphabetic index at the end of the book. The interested reader can scan the index relatively quickly and flip to the appropriate page(s), rather than having to read the entire book to find the material of interest. Just as it is the task of the author to anticipate the items that readers are likely to look up, it is the task of the database programmer to foresee which indexes will be useful.
The following command can be used to create an index on the id column, as discussed:
CREATE INDEX test1_id_index ON test1 (id);
The name test1_id_index can be chosen freely, but you should pick something that enables you to remember later what the index was for.
To remove an index, use the DROP INDEX command. Indexes can be added to and removed from tables at any time.
Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will use the index in queries when it thinks doing so would be more efficient than a sequential table scan. But you might have to run the ANALYZE command regularly to update statistics to allow the query planner to make educated decisions
Indexes can also benefit UPDATE and DELETE commands with search conditions. Indexes can moreover be used in join searches. Thus, an index defined on a column that is part of a join condition can also significantly speed up queries with joins.
Creating an index on a large table can take a long time. By default, PostgreSQL allows reads (SELECT statements) to occur on the table in parallel with index creation, but writes (INSERT, UPDATE, DELETE) are blocked until the index build is finished. In production environments this is often unacceptable. It is possible to allow writes to occur in parallel with index creation, but there are several caveats to be aware of — for more information see Building Indexes Concurrently.
After an index is created, the system has to keep it synchronized with the table. This adds overhead to data manipulation operations. Therefore indexes that are seldom or never used in queries should be removed.
An index can be defined on more than one column of a table. For example, if you have a table of this form:
CREATE TABLE test2 ( major int, minor int, name varchar );
(say, you keep your /dev directory in a database…) and you frequently issue queries like:
SELECT name FROM test2 WHERE major = constant AND minor = constant; then it might be appropriate to define an index on the columns major and minor together, e.g.:
CREATE INDEX test2_mm_idx ON test2 (major, minor);
Retrieve everything from a table
SELECT * FROM cd.facilities;
Retrieve specific columns from a table
SELECT name, membercost FROM cd.facilities;
Control which rows are retrieved
SELECT * FROM cd.facilities WHERE membercost > 0;
Control which rows are retrieved - part 2
SELECT facid, name, membercost, monthlymaintenance FROM cd.facilities WHERE membercost > 0 AND membercost < (1.0/50.0) * monthlymaintenance
Basic string searches
SELECT * from cd.facilities WHERE name LIKE '%Tennis%';
Matching against multiple possible values
SELECT * FROM cd.facilities WHERE facid in (1,5);
Classify results into buckets
SELECT name, CASE WHEN monthlymaintenance > 100 THEN 'expensive' ELSE 'cheap' END AS cost FROM cd.facilities;
Working with dates
SELECT memid, surname, firstname, joindate FROM cd.members WHERE joindate >= '2012-09-01'
Removing duplicates and ordering results
SELECT DISTINCT surname FROM cd.members ORDER BY surname LIMIT 10;
Combining results from multiple queries
SELECT surname FROM cd.members UNION SELECT NAME FROM cd.facilities
SELECT MAX(joindate) AS latest FROM cd.members
SELECT firstname, surname, joindate FROM cd.members ORDER BY joindate DESC LIMIT 1
Retrieve the start times of members’ bookings
SELECT bks.starttime FROM cd.bookings bks INNER JOIN cd.members mbs ON mbs.memid = bks.memid WHERE mbs.firstname = 'David' AND mbs.surname = 'Farrell';
Retrieve the start times of members’ bookings (alt syntax)
SELECT bks.starttime FROM cd.bookings bks, cd.members mems WHERE mems.firstname='David' AND mems.surname='Farrell' AND mems.memid = bks.memid;
Work out the start times of bookings for tennis courts
SELECT bks.starttime, fls.name FROM cd.bookings bks INNER JOIN cd.facilities fls ON bks.facid = fls.facid WHERE fls.name LIKE '%Tennis%Court%' AND bks.starttime >= '2012-09-21' AND bks.starttime < '2012-09-22' ORDER BY bks.starttime ASC
Produce a list of all members who have recommended another member
SELECT DISTINCT mbs1.firstname, mbs1.surname FROM cd.members mbs1 INNER JOIN cd.members mbs2 ON mbs2.recommendedby = mbs1.memid ORDER BY mbs1.surname, mbs1.firstname
Produce a list of all members, along with their recommender
SELECT mems.firstname AS memfname, mems.surname AS memsname, recs.firstname AS recfname, recs.surname AS recsname FROM cd.members mems LEFT OUTER JOIN cd.members recs ON recs.memid = mems.recommendedby ORDER BY memsname, memfname
3-way join: Produce a list of all members who have used a tennis court
SELECT DISTINCT mems.firstname || ' ' || mems.surname AS member, facs.name AS facility FROM cd.members mems INNER JOIN cd.bookings bks ON mems.memid = bks.memid INNER JOIN cd.facilities facs ON bks.facid = facs.facid WHERE facs.name LIKE '%Tennis Court%' ORDER BY member ASC, facility DESC
Produce a list of costly bookings
SELECT mbs.firstname || ' ' || mbs.surname as member, facs.name AS facility, CASE WHEN mbs.memid = 0 THEN facs.guestcost * bks.slots ELSE facs.membercost * bks.slots END AS cost FROM cd.members AS mbs INNER JOIN cd.bookings AS bks ON mbs.memid = bks.memid INNER JOIN cd.facilities AS facs ON bks.facid = facs.facid WHERE bks.starttime >= '2012-09-14' AND bks.starttime < '2012-09-15' AND ( (mbs.memid = 0 and bks.slots*facs.guestcost > 30) OR (mbs.memid != 0 and bks.slots*facs.membercost > 30) ) ORDER BY cost DESC
Produce a list of all members, along with their recommender, using subqueries.
SELECT DISTINCT mems.firstname || ' ' || mems.surname AS member, CASE WHEN mems.recommendedby IS NOT NULL THEN (SELECT recs.firstname || ' ' || recs.surname FROM cd.members as recs WHERE recs.memid = mems.recommendedby) ELSE NULL END AS recommender FROM cd.members AS mems ORDER BY member ASC
For every value of member, the subquery is run once to find the name of the individual who recommended them (if any). A subquery that uses information from the outer query in this way (and thus has to be run for each row in the result set) is known as a correlated subquery.
Produce a list of costly bookings, using a subquery
SELECT member, facility, cost FROM ( SELECT mems.firstname || ' ' || mems.surname AS member, facs.name AS facility, CASE WHEN mems.memid = 0 THEN bks.slots * facs.guestcost ELSE bks.slots * facs.membercost END as cost FROM cd.members mems INNER JOIN cd.bookings bks ON mems.memid = bks.memid INNER JOIN cd.facilities facs ON facs.facid = bks.facid WHERE bks.starttime >= '2012-09-14' AND bks.starttime < '2012-09-15' ) AS bookings WHERE cost > 30 ORDER BY cost DESC
Subqueries in the FROM clause are referred to as inline views.
Insert some data into a table
INSERT INTO cd.facilities VALUES (9, 'Spa', 20, 30, 100000, 800)
Insert multiple rows of data into a table using VALUES
INSERT INTO cd.facilities (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance) VALUES (9, 'Spa', 20, 30, 100000, 800), (10, 'Squash Court 2', 3.5, 17.5, 5000, 80)
Postgres allows you to use VALUES wherever you might use a SELECT.Similarly, it’s possible to use SELECT wherever you see a VALUES. See next example
Insert multiple rows of data into a table using SELECT
INSERT INTO cd.facilities (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance) SELECT 9, 'Spa', 20, 30, 100000, 800 UNION ALL SELECT 10, 'Squash Court 2', 3.5, 17.5, 5000, 80;
Insert calculated data into a table
INSERT INTO cd.facilities (facid, name, membercost, guestcost, initialoutlay, monthlymaintenance) VALUES ( (SELECT MAX(facid) FROM cd.facilities) + 1, 'Spa', 20, 30, 100000, 800 )
Update some existing data
UPDATE cd.facilities SET initialoutlay = 10000 WHERE facid = 1
Update multiple rows and columns at the same time
UPDATE cd.facilities SET membercost = 6, guestcost = 30 WHERE facid IN (0,1)
Update a row based on the contents of another row
UPDATE cd.facilities SET membercost = (SELECT membercost from cd.facilities WHERE facid = 0) * 1.1, guestcost = (SELECT guestcost from cd.facilities WHERE facid = 0) * 1.1 WHERE facid = 1
Update a row based on the contents of another row - method 2
UPDATE cd.facilities AS facs SET membercost = facs2.membercost * 1.1, guestcost = facs2.guestcost * 1.1 FROM (SELECT * FROM cd.facilities WHERE facid = 0) AS facs2 WHERE facs.facid = 1;
Delete all bookings
DELETE FROM cd.bookings
Delete a member from the cd.members table
DELETE FROM cd.members WHERE memid = 37
Delete based on a subquery
DELETE FROM cd.members AS mems WHERE mems.memid NOT IN ( SELECT memid FROM cd.bookings )
Count the number of facilities
SELECT COUNT(*) FROM cd.facilities;
Count the number of expensive facilities
SELECT count(*) FROM cd.facilities WHERE guestcost >=10
Count the number of recommendations each member makes.
SELECT recommendedby, count(*) FROM cd.members mbs WHERE recommendedby IS NOT NULL GROUP BY recommendedby ORDER BY recommendedby
For each distinct value of recommendedby, get me the number of times that value appears
List the total slots booked per facility
SELECT facid, sum(slots) AS "Total Slots" FROM cd.bookings group BY facid ORDER BY facid;
List the total slots booked per facility in a given month
SELECT facid, SUM(slots) AS "Total Slots" FROM cd.bookings WHERE starttime >= '2012-09-01' AND starttime < '2012-10-01' GROUP BY facid ORDER BY "Total Slots"
List the total slots booked per facility per month
SELECT facid, EXTRACT(month from starttime) AS month, sum(slots) AS "Total Slots" FROM cd.bookings WHERE starttime >= '01-01-2012' AND starttime < '01-01-2013' GROUP BY facid, month ORDER BY facid, month;
Find the count of members who have made at least one booking
SELECT COUNT(DISTINCT memid) FROM cd.bookings
List facilities with more than 1000 slots booked
SELECT facid, SUM(slots) AS "Total Slots" FROM cd.bookings bks GROUP BY facid HAVING SUM(slots) > 1000 ORDER BY facid
Find the total revenue of each facility
SELECT facs.name, SUM((CASE WHEN bks.memid = 0 THEN bks.slots * facs.guestcost ELSE bks.slots * facs.membercost END)) AS revenue FROM cd.bookings bks INNER JOIN cd.facilities facs ON bks.facid = facs.facid GROUP BY facs.name ORDER BY revenue
Find facilities with a total revenue less than 1000
SELECT * FROM ( SELECT facs.name, SUM(CASE WHEN bks.memid = 0 THEN bks.slots * facs.guestcost ELSE bks.slots * facs.membercost END) AS revenue FROM cd.bookings bks INNER JOIN cd.facilities facs ON bks.facid = facs.facid GROUP BY facs.name ) AS result WHERE revenue < 1000 ORDER BY revenue ASC
Output the facility id that has the highest number of slots booked
SELECT facid, SUM(slots) AS "Total Slots" FROM cd.bookings AS bks GROUP BY facid ORDER BY "Total Slots" DESC LIMIT 1
In the event of a tie, we’ll only get one result
Output the facility id that has the highest number of slots booked using Common Table Expressions
WITH sum AS (SELECT facid, SUM(slots) AS totalslots FROM cd.bookings GROUP BY facid ) SELECT facid, totalslots FROM sum WHERE totalslots = (SELECT MAX(totalslots) FROM sum);
List the total slots booked per facility per month, part 2, using CTEs
WITH monthlysum AS ( SELECT facid, extract(MONTH from starttime) as month, SUM(slots) as slots FROM cd.bookings WHERE starttime >= '2012-01-01' AND starttime < '2013-01-01' GROUP BY facid, month ORDER BY facid ) SELECT facid, month, sum(slots) FROM monthlysum GROUP BY rollup(facid, month) ORDER BY facid, month
ROLLUP produces a hierarchy of aggregations in the order passed into it: for example, ROLLUP(facid, month) outputs aggregations on (facid, month), (facid), and (). Alternatively, if we instead want all possible permutations of the columns we pass in, we can use CUBE rather than ROLLUP. This will produce (facid, month), (month), (facid), and (). ROLLUP and CUBE are special cases of GROUPING SETS. GROUPING SETS allow you to specify the exact aggregation permutations you want.
List the total hours booked per named facility
SELECT facs.facid, facs.name, trim(to_char(SUM(bks.slots) * 0.5, '9999.99')) AS "Total Hours" FROM cd.facilities facs INNER JOIN cd.bookings bks ON facs.facid = bks.facid GROUP BY facs.facid, facs.name ORDER BY facs.facid
List each member’s first booking after September 1st 2012
SELECT mems.surname, mems.firstname, mems.memid, min(bks.starttime) AS starttime FROM cd.bookings AS bks INNER JOIN cd.members AS mems ON mems.memid = bks.memid WHERE starttime >= '2012-09-01' GROUP BY mems.surname, mems.firstname, mems.memid ORDER BY mems.memid;
Produce a list of member names, with each row containing the total member count
SELECT COUNT(*) over(), firstname, surname FROM cd.members ORDER BY joindate
Produce a numbered list of members
SELECT rank() over(ORDER BY joindate ASC) AS row_number, mems.firstname, mems.surname FROM cd.members mems
Output the facility id that has the highest number of slots booked, again
SELECT facid, total FROM ( SELECT facid, SUM(slots) AS total, rank() over (ORDER BY SUM(slots) desc) AS rank FROM cd.bookings GROUP BY facid ) AS ranked WHERE rank = 1
Find the top three revenue generating facilities
SELECT name, rank FROM (SELECT facs.name, rank() over (ORDER BY SUM( CASE WHEN bks.memid = 0 THEN bks.slots * facs.guestcost ELSE bks.slots * facs.membercost END ) DESC) FROM cd.facilities AS facs INNER JOIN cd.bookings AS bks ON bks.facid = facs.facid GROUP BY name ) AS rankings WHERE rank <= 3 ORDER BY rank ASC
Format the names of members
SELECT trim(mbs.surname || ', ' || mbs.firstname) as name FROM cd.members mbs
Find facilities by a name prefix
SELECT * FROM cd.facilities facs WHERE facs.name LIKE 'Tennis%'
Perform a case-insensitive search
SELECT * FROM cd.facilities facs WHERE facs.name ILIKE 'tennis%'
Find telephone numbers with parentheses
SELECT memid, telephone FROM cd.members WHERE telephone ~ '[()]';
Pad zip codes with leading zeroes
SELECT LPAD(zipcode::text, 5, '0') as zip FROM cd.members
Count the number of members whose surname starts with each letter of the alphabet
SELECT substr(surname, 1, 1) as letter, COUNT(*) FROM cd.members GROUP BY letter ORDER BY letter
Window function - example 1
SELECT last_name, salary, department, rank() OVER ( PARTITION BY department ORDER BY salary DESC) FROM employees;
last_name salary department rank
Jones 45000 Accounting 1
Williams 37000 Accounting 2
Smith 55000 Sales 1
Adams 50000 Sales 2
Johnson 40000 Marketing 1
Window function - example 2
SELECT * FROM ( SELECT last_name, salary, department, rank() OVER ( PARTITION BY department ORDER BY salary DESC ) FROM employees) sub_query WHERE rank = 1;
last_name salary department rank
Jones 45000 Accounting 1
Smith 55000 Sales 1
Johnson 40000 Marketing 1
Calculate a rolling average of total revenue over the previous 15 days
select date, avgrev from ( -- AVG over this row and the 14 rows before it. select dategen.date as date, avg(revdata.rev) over(order by dategen.date rows 14 preceding) as avgrev from -- generate a list of days. This ensures that a row gets generated -- even if the day has 0 revenue. Note that we generate days before -- the start of october - this is because our window function needs -- to know the revenue for those days for its calculations. ( SELECT generate_series('2012-07-10'::timestamp, '2012-08-31'::timestamp, '1 day')::date AS date ) AS dategen left outer join -- left join to a table of per-day revenue (select bks.starttime::date as date, sum(case when memid = 0 then slots * facs.guestcost else slots * membercost end) as rev from cd.bookings bks inner join cd.facilities facs on bks.facid = facs.facid group by cast(bks.starttime as date) ) as revdata on dategen.date = revdata.date ) as subq where date >= '2012-08-01' order by date;