SQL Server Logical Query Processing Order
Close To Me • The Cure • 1985
• T-sql
SQL defined an physical processing order, or ‘keyed-in order’, similar to the English language spoken. However, the data is processed from the input tables through steps into the query result. It’s the Logical Query Processing Order. I often write my queries in this order. Doesn’t that make sense?
/*(8)*/ SELECT /*9*/ DISTINCT /*11*/ TOP num selectList
/*(1)*/ FROM leftTab
/*(3)*/ joinType JOIN rightTab
/*(2)*/ ON joinCondition
/*(4)*/ WHERE whereCondition
/*(5)*/ GROUP BY groupByList
/*(6)*/ WITH {CUBE | ROLLUP}
/*(7)*/ HAVING havingCondition
/*(10)*/ ORDER BY orderByLiistt
Let’s go deeper into it!
Virtual Table(VT) is used as the input to the next step. VTs are not available to the caller, only the last one is returned.
Getting Data
(1) … FROM
identifies the source tables, performed a CROSS JOIN
between the tables to assembling data, get n * m rows, and generate VT1.
(2) … ON joinCondition
is applied on VT1. True
rows are inserted in VT2.
(3) … joinType JOIN
: If the join is an INNER JOIN
, the result of the 2nd step is also the result of the join so VT2.
If it’s an OUTER JOIN
, it’s a bit more tricky for me. In a OUTER JOIN
, you mark one or both input tables as preserved by specifying the type of OUTER JOIN
(left, right, full). Preserved means that you want all of its rows returned, even when filtered out by the joinCondition
. A LEFT OUTER JOIN
marks the left table as preserved and so long. It’s return VT3, rows from VT2 plus rows from the preserved table for which a match was not found in VT2. This rows as referred to as outer rows.
Note : If the FROM
clause contains 2 or more tables, process repeat steps 1 through 3 untill all tables completely processed.
Rows Filter
(4) … WHERE
: Only rows from VT3 with whereCondition
is TRUE are inserted to VT4. Alias defined in the SELECT
clause are not allowed here because the WHERE
clause is evaluated before the SELECT
clause. Because the data is not grouped yet, you cannot use aggregate filters here ( cannot write WHERE orderdate = MAX(orderdate
)).
Note : With the OUTER JOIN
, when specify a logical expresion in the ON
filter or in the WHERE
filter? ON
is applied before adding outer rows. An elimination of a row from the preserved table by the ON
filter is not final because OUTER JOIN
will add it back. WHERE
is applied after adding outer rows. The elimination of a row by the WHERE
filter is final.
Grouping rows and groups
(5) … GROUP BY
: Rows from VT4 are arranged by groupByList
. VT5 is generated. If you specify a GROUP BY
clause in the query, then all subsequent steps (eg: HAVING, SELECT) can be specified expression scalar values obtained for groups. If you want to refer to an element that is not part of your groupByList
, it must be contained within an aggregate function like MAX()
or SUM()
.
(6) … WITH {CUBE | ROLLUP}
: Groups of groups are added from VT5 generating VT6.
Group filter
(7) … HAVING
: Only groups with havingCondition
TRUE are inserted to VT7. It is evaluated after the data has been grouped. The HAVING
clause uses the predicate COUNT(*) > 1.
Process the SELECT clause
(8) … SELECT
: The selectList
is processed, generating VT8. It’s producing the result attributes and assign names if they are derived from expressions. Alias created in the selectList
can not be used in the previous steps, can no longer even use the selectList
, it can only be used in the ORDER BY
. Note that an alias created by the SELECT
step isn’t even visible to other expressions that appear in the same selectList
.
(9) … DISTINCT
: Duplicated rows are removed, VT9 is generated.
Handle Presentation Ordering
(10) … ORDER BY
: Rows are sorted by the orderByList
, generate a cursor (VC10). Cursor is an object that contains a particular physical organization row order. The use the ORDER BY
clause of a query can not be used as a table expression. Table expression comprising: a view, inline table valued function, sub-queries, and the common expressions derived table (CTE). Unless you want to order lines, do not specify an ORDER BY
clause. Sortng is cost. Because this step is not to return the table, but returns the cursor, use the ORDER BY
clause of a query can not be used as a table expression.
Note: Use GROUP BY, then use the DISTINCT is redundant.
(11) … TOP / OFFSET-FETCH
: Selected a specified number or percentage of rows from VC10 and return the caller.