Thursday, September 6, 2012

Use of Distinct and OrderBy in LINQ


Use of Distinct and OrderBy in LINQ


The requirement was something like: get the distinct values of (bla bla bla) sorted alphabetically. An example of the required query with Northwind would be the following one:
SELECT DISTINCT
        e.LastName
FROM    Orders o
LEFT JOIN [Employees] e
        ON e.[EmployeeID] = o.[EmployeeID]
ORDER BY e.LastName 
Fundamentally, we are using both a DISTINCT and an ORDER BY statement in SQL.
Now, if you create a NorthwindDataContext importing the Order and Employee tables, you can try to write a similar statement in LINQ to SQL. Unfortunately, the Distinct clause is not part of the query syntax and the most intuitive path could be the one of calling Distinct at the end of your statement, like in the following query:
var queryA =
    (from o in db.Orders
     orderby o.Employee.LastName
     select o.Employee.LastName)
     .Distinct();
However, the Distinct clause is removing the sort condition defined by the orderby keyword. In fact, the SQL statement sent to the database is the following one:
SELECT DISTINCT
        [t1].[LastName]
FROM    [dbo].[Orders] AS [t0]
LEFT OUTER JOIN [dbo].[Employees] AS [t1]
        ON [t1].[EmployeeID] = [t0].[EmployeeID]
This behavior might appear strange. The problem is that the Distinct operator does not grant that it will maintain the original order of values. Applied to LINQ to SQL, this mean that a sort constraint can be ignored in the case of a query like queryA.
The solution is pretty sample: put the OrderBy operator after the Distinct one, like in the following queryB definition:
var queryB = 
    (from o in db.Orders
     select o.Employee.LastName)
    .Distinct().OrderBy( n => n );
This will result in the following SQL statement sent to Northwind:
SELECT  [t2].[LastName]
FROM    ( SELECT DISTINCT
                    [t1].[LastName]
          FROM      [dbo].[Orders] AS [t0]
          LEFT OUTER JOIN [dbo].[Employees] AS [t1]
                    ON [t1].[EmployeeID] = [t0].[EmployeeID]
        ) AS [t2]
ORDER BY [t2].[LastName]

The lesson is: in a SQL query, the position of an operator is not relevant until operators belong to the same SELECT/FROM statement. In LINQ, this is not true and the conversion to SQL could remove LINQ operators when their operation might be ignored by other operators in the same LINQ query.