Roland Bouman’s blog: Year-to-Date on Synapse Analytics 3: Using a Subquery

Roland Bouman’s blog: Year-to-Date on Synapse Analytics 3: Using a Subquery

[ad_1]


For one of our Just-BI consumers we executed a Calendar year-to-Day calculation in a Azure Synapse Backend.
We encountered a couple of ways and in this sequence I might like to share some sample code, and examine some of the merits and rewards of each tactic.

TLDR: A Year-to-Day remedy primarily based on a SUM() window functionality is easy to code and keep as very well as economical to execute.
This as when compared to a quantity of choice implementations, specifically a self-Join (merged with a Team BY), a subquery, and a UNION (also blended with a Team BY).

Notice: this is the 3rd submit in a series.

(While our use scenario promotions with Azure Synapse, most of the code will be instantly suitable with other SQL Engines and RDBMS-es.)

Making use of a subquery

We can also imagine of YTD calculation as a independent query that we execute for each individual row of the SalesYearMonth desk.
Though this does indicate a row-by-row tactic, we can even now translate this conveniently to pure SQL by developing an expression in the Select-checklist, which makes use of a subquery to work out the YTD price for the present-day row:

choose      SalesOriginal.SalesYear
,           SalesOriginal.SalesMonth
,           SalesOriginal.SalesAmount
,           (
                select sum(SalesYtd.SalesAmount)
                from   SalesYearMonth as SalesYtd
                the place  SalesYtd.SalesYear   = SalesOriginal.SalesYear
                and    SalesYtd.SalesMonth as SalesYtd
from        SalesYearMonth as SalesOriginal

There’s a similarity with the JOIN-solution, in that we use the SalesYearMonth table twice, but in different roles.
In the JOIN-solution both appeared on one side of the JOIN keyword and we used the aliases OriginalSales and YtdSales to be able to keep them apart.
In the subquery approach, the distinction between these two different instances of the SalesYearMonth table is more explicit: the main instance of the SalesYearMonth table occurs in the FROM-clause, and the one for the YTD calculation occurs in the SELECT-list.

Also similar to the JOIN solution is the condition to tie the set for the YTD calculation to the main query using the SalesYear and SalesMonth columns.
Such a subquery is referred to as a correlated subquery.

See also  To the psychopath, the relationship meant nothing | Lovefraud | Escape sociopaths

As for any differences with the JOIN solution:
In the condition, the only difference is the left/right placement of SalesOriginal and SalesYtd, which is chosen only by order of appearance in the query but functionally completely equivalent.
The most striking difference between the JOIN solution and the subquery is the absence of the GROUP BY-list in the latter.

Drawbacks of the subquery

As we had much to complain about the GROUP BY-list in the JOIN solution, it might seem that the subquery solution is somehow “better”.
However, a solution with a correlated subquery in general tends to be slower than a JOIN solution.
Whether this is actually the case depends on on many variables and you’d really have to check it against your SQL engine and datasets.

Another drawback of the subquery solution becomes clear when we want to calculate the YTD for multiple measures.
Our example only has one SalesAmount measure, but in this same context we can easily imagine that we also want to know about price, discount amounts, tax amounts, shipping costs, and so on.

In the JOIN solution, we would simply add any extra measures to the select list, using MAX() (or MIN() or AVG()) to obtain the original value, and SUM() to calculate its respective YTD value:
As long as it’s over the same set, the JOIN, its condition, and even the GROUP BY-list would remain the same, no matter for how many different measures we would add a YTD calculation.

This is very different in the subquery case.
Each measure for which you need a YTD calculation would get its own subquery.
Even though the condition would be the same for each such YTD calculation, you would still need to repeat the subquery code – one for each YTD measure.

See also  Ganti Keyboard | deptz personal blog

Next installment: Solution 3 – using a UNION

In the next installment we will present and discuss a solution based on a UNION and a GROUP BY.

[ad_2]

Source link