Roland Bouman’s blog: Year-to-Date on Synapse Analytics 3: Using a Subquery
[ad_1]
We encountered a couple of ways and in this sequence I might like to share some sample code, and examine some of the merits and rewards of each tactic.
TLDR: A Year-to-Day remedy primarily based on a SUM()
window functionality is easy to code and keep as very well as economical to execute.
This as when compared to a quantity of choice implementations, specifically a self-Join
(merged with a Team BY
), a subquery, and a UNION
(also blended with a Team BY
).
Notice: this is the 3rd submit in a series.
(While our use scenario promotions with Azure Synapse, most of the code will be instantly suitable with other SQL Engines and RDBMS-es.)
Making use of a subquery
We can also imagine of YTD calculation as a independent query that we execute for each individual row of the SalesYearMonth
desk.
Though this does indicate a row-by-row tactic, we can even now translate this conveniently to pure SQL by developing an expression in the Select
-checklist, which makes use of a subquery to work out the YTD price for the present-day row:
choose SalesOriginal.SalesYear , SalesOriginal.SalesMonth , SalesOriginal.SalesAmount , ( select sum(SalesYtd.SalesAmount) from SalesYearMonth as SalesYtd the place SalesYtd.SalesYear = SalesOriginal.SalesYear and SalesYtd.SalesMonth as SalesYtd from SalesYearMonth as SalesOriginal
There’s a similarity with the JOIN
-solution, in that we use the SalesYearMonth
table twice, but in different roles.
In the JOIN
-solution both appeared on one side of the JOIN
keyword and we used the aliases OriginalSales
and YtdSales
to be able to keep them apart.
In the subquery approach, the distinction between these two different instances of the SalesYearMonth
table is more explicit: the main instance of the SalesYearMonth
table occurs in the FROM
-clause, and the one for the YTD calculation occurs in the SELECT
-list.
Also similar to the JOIN
solution is the condition to tie the set for the YTD calculation to the main query using the SalesYear
and SalesMonth
columns.
Such a subquery is referred to as a correlated subquery.
As for any differences with the JOIN
solution:
In the condition, the only difference is the left/right placement of SalesOriginal
and SalesYtd
, which is chosen only by order of appearance in the query but functionally completely equivalent.
The most striking difference between the JOIN
solution and the subquery is the absence of the GROUP BY
-list in the latter.
Drawbacks of the subquery
As we had much to complain about the GROUP BY
-list in the JOIN
solution, it might seem that the subquery solution is somehow “better”.
However, a solution with a correlated subquery in general tends to be slower than a JOIN
solution.
Whether this is actually the case depends on on many variables and you’d really have to check it against your SQL engine and datasets.
Another drawback of the subquery solution becomes clear when we want to calculate the YTD for multiple measures.
Our example only has one SalesAmount
measure, but in this same context we can easily imagine that we also want to know about price, discount amounts, tax amounts, shipping costs, and so on.
In the JOIN
solution, we would simply add any extra measures to the select list, using MAX()
(or MIN()
or AVG()
) to obtain the original value, and SUM()
to calculate its respective YTD value:
As long as it’s over the same set, the JOIN
, its condition, and even the GROUP BY
-list would remain the same, no matter for how many different measures we would add a YTD calculation.
This is very different in the subquery case.
Each measure for which you need a YTD calculation would get its own subquery.
Even though the condition would be the same for each such YTD calculation, you would still need to repeat the subquery code – one for each YTD measure.
Next installment: Solution 3 – using a UNION
In the next installment we will present and discuss a solution based on a UNION and a GROUP BY.
[ad_2]
Source link