Get Ready to Learn SQL Server 22: Using Subqueries in the FROM Clause - CodeProject
This is the fourth in a series of articles about subqueries. In this post, we discuss subqueries in the
FROM clause. Other articles discuss their uses in other clauses.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the
Using Subqueries in the FROM Clause
When subqueries are used in the
FROM clause, they act as a table that you can use to select columns and join to other tables. Because of this, some people argue they really aren’t subqueries, but derived tables. I like to think of derived tables as a special case of subqueries… subqueries used in the
Regardless of what you call them, there are some unique features derived tables bring to the SQL world that are worth mentioning.
Before we jump into those though, let’s start with the basics, then get familiar with them.
Like all subqueries, those used in the
FROM clause to create a derived table are enclosed by parenthesis. Unlike other subqueries though, a derived table must be aliased so that you can reference its results.
In this example, we’re going to select territories and their average bonuses.
SELECT TerritoryID, AverageBonus FROM (SELECT TerritoryID, Avg(Bonus) AS AverageBonus FROM Sales.SalesPerson GROUP BY TerritoryID) AS TerritorySummary ORDER BY AverageBonus
The subquery alias name is
TerritorySummary and is highlighted in bold.
When this query run, the subquery is first run and the results created. The results are then used in the
FROM clause as if it were a table. When used by themselves, these types of queries aren’t very fascinating; however, when used in combination with other tables, they are.
Let’s add a
join to our example above.
SELECT SP.TerritoryID, SP.BusinessEntityID, SP.Bonus, TerritorySummary.AverageBonus FROM (SELECT TerritoryID, AVG(Bonus) AS AverageBonus FROM Sales.SalesPerson GROUP BY TerritoryID) AS TerritorySummary INNER JOIN Sales.SalesPerson AS SP ON SP.TerritoryID = TerritorySummary.TerritoryID
From a data modeling point of view, this query looks like the following:
There is a relationship between the
TerritorySummary and the joined table
SalesPerson. Of course,
TerritorySummary only exists as part of this SQL statement since it is derived.
By using derived tables, we are able to summarize using one set of fields and report on another. In this case, we’re summarizing by
TerritoryID but report by each sales person (
You may think you could simply replicate this query using an
INNER JOIN, but you can’t as the final
GROUP BY for the query would have to include
BusinessEntityID, which would then throw-off the Average calculation.
Things You Can Do With Derived Table Subqueries
In many cases, table subqueries can be “flattened.” That is replaced by equivalent joins; however, there are times where a subquery in the
FROM clause shines. Here are a couple of examples I thought of.
Derived Tables and Aggregate Functions
When working with aggregate functions, you may have wanted to first summarize some values and then get the overall average. For instance, suppose you want to know the average bonus given for all territories.
If you run:
SELECT AVG(SUM(Bonus)) FROM Sales.SalesPerson GROUP BY Bonus
You’ll get the following error: Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
You may think you can simply run:
SELECT AVG(Bonus) FROM Sales.SalesPerson
But that calculates the average for bonus for each sales person. To get the average bonus for territories, you first have to calculate the total bonus by territory, and then take the average.
Once way to do this is with derived tables. In the following example, the derived table is used to summarize sales by territory. These are then fed into the
Average function to obtain an overall average.
SELECT AVG(B.TotalBonus) FROM (SELECT TerritoryID, SUM(Bonus) AS TotalBonus FROM Sales.SalesPerson GROUP BY TerritoryID) AS B
Joining Derived Tables
You can also join two derived tables together! This can come in handy when you need to aggregate data from two separate tables and combine them together. In the following example, we’re going to do a comparison of Territory Sales to Territory Sales Quotas. We’ll do this by summing sales figures and quotas by
Here are the two
Select statements we’ll use to summarize the sales figures:
SELECT TerritoryID, SUM(SalesQuota) AS TerritoryQuota FROM Sales.SalesPerson GROUP BY TerritoryID
SELECT SOH.TerritoryID, SUM(SOH.TotalDue) AS TerritorySales FROM Sales.SalesOrderHeader AS SOH GROUP BY SOH.TerritoryID
To obtain the comparison, we’ll match these two results together by territory ID. By using
FROM clause subqueries our SQL to do the comparison is:
SELECT Quota.TerritoryID, Quota.TerritoryQuota, Sales.TerritorySales, Sales.TerritorySales - Quota.TerritoryQuota FROM (SELECT TerritoryID, SUM(SalesQuota) AS TerritoryQuota FROM Sales.SalesPerson GROUP BY TerritoryID) AS Quota INNER JOIN (SELECT SOH.TerritoryID, SUM(SOH.TotalDue) AS TerritorySales FROM Sales.SalesOrderHeader AS SOH GROUP BY SOH.TerritoryID) AS Sales ON Quota.TerritoryID = Sales.TerritoryID
You can see the statement to summarize sales quotas is in bold and the statement summarize actual sales in italics.
In many cases, what you can do with derived tables you can do with joins; however, there are special cases where this isn’t the case. To me, the best explanation is when you need to use two aggregate functions, such as taking the average of a sum.
Keep in mind when writing SQL, it is best to go with the simplest and easiest solution, which in my opinion is usually an
INNER JOIN, but not every solution is solved as such. The double aggregation problem is a good example where a derived table shines.