We encountered a couple of ways and in this sequence I might like to share some sample code, and examine some of the merits and rewards of each tactic.
TLDR: A Year-to-Day remedy primarily based on a
SUM() window functionality is easy to code and keep as very well as economical to execute.
This as when compared to a quantity of choice implementations, specifically a self-
Join (merged with a
Team BY), a subquery, and a
UNION (also blended with a
Notice: this is the 3rd submit in a series.
(While our use scenario promotions with Azure Synapse, most of the code will be instantly suitable with other SQL Engines and RDBMS-es.)
We can also imagine of YTD calculation as a independent query that we execute for each individual row of the
Though this does indicate a row-by-row tactic, we can even now translate this conveniently to pure SQL by developing an expression in the
Select-checklist, which makes use of a subquery to work out the YTD price for the present-day row:
choose SalesOriginal.SalesYear , SalesOriginal.SalesMonth , SalesOriginal.SalesAmount , ( select sum(SalesYtd.SalesAmount) from SalesYearMonth as SalesYtd the place SalesYtd.SalesYear = SalesOriginal.SalesYear and SalesYtd.SalesMonth as SalesYtd from SalesYearMonth as SalesOriginal
There’s a similarity with the
JOIN-solution, in that we use the
SalesYearMonth table twice, but in different roles.
JOIN-solution both appeared on one side of the
JOIN keyword and we used the aliases
YtdSales to be able to keep them apart.
In the subquery approach, the distinction between these two different instances of the
SalesYearMonth table is more explicit: the main instance of the
SalesYearMonth table occurs in the
FROM-clause, and the one for the YTD calculation occurs in the
Also similar to the
JOIN solution is the condition to tie the set for the YTD calculation to the main query using the
Such a subquery is referred to as a correlated subquery.
As for any differences with the
In the condition, the only difference is the left/right placement of
SalesYtd, which is chosen only by order of appearance in the query but functionally completely equivalent.
The most striking difference between the
JOIN solution and the subquery is the absence of the
GROUP BY-list in the latter.
Drawbacks of the subquery
As we had much to complain about the
GROUP BY-list in the
JOIN solution, it might seem that the subquery solution is somehow “better”.
However, a solution with a correlated subquery in general tends to be slower than a
Whether this is actually the case depends on on many variables and you’d really have to check it against your SQL engine and datasets.
Another drawback of the subquery solution becomes clear when we want to calculate the YTD for multiple measures.
Our example only has one
SalesAmount measure, but in this same context we can easily imagine that we also want to know about price, discount amounts, tax amounts, shipping costs, and so on.
JOIN solution, we would simply add any extra measures to the select list, using
AVG()) to obtain the original value, and
SUM() to calculate its respective YTD value:
As long as it’s over the same set, the
JOIN, its condition, and even the
GROUP BY-list would remain the same, no matter for how many different measures we would add a YTD calculation.
This is very different in the subquery case.
Each measure for which you need a YTD calculation would get its own subquery.
Even though the condition would be the same for each such YTD calculation, you would still need to repeat the subquery code – one for each YTD measure.
Next installment: Solution 3 – using a UNION
In the next installment we will present and discuss a solution based on a UNION and a GROUP BY.