Tag Archives: common table expression

Range Notation

Business Case: A record set consists of sequential values that must be represented in standard range notation. For example, values include 1, 2, 3, 5, 6, 8, 9, 10, 11, 14, 15, and 16; these values should be represented as a single, comma-separated string with sequential values compressed into ranges, as follows: “1-3,5,6,8-11,14-16”.

Environment: There are no environmental factors with a relevant influence on this business case.

Approach: We’ll attempt to meet this requirement in a series of steps that build upon each other. First, we’ll identify sequence gaps in the record set. With that information we’ll determine how many values are in each range. If there are more than two sequential values, we’ll compress the range into range notation indicating the first value and the last value in the range. Finally, we’ll incorporate the single-values and the ranges and convert them into a comma-separate string using the FOR XML clause.

Solution: Let’s see the solution in action, starting with creation of our sample data set.

-- setup sample data
DECLARE @tbl TABLE (ID INT);
INSERT INTO @tbl (ID)
VALUES (1), (2), (3)
    , (5), (6)
    , (8), (9), (10), (11)
    , (14), (15), (16);

With the same data set created, we’ll define the pieces necessary to reach the desired output. For this example, we’ll use common table expressions to enhance readability of the solution. Note that it can be done without CTEs, but it would be more difficult to follow due to the nested sub-queries.

Our initial step is to identify the gaps in sequencing; we’ll do this by joining the data set to itself, with the sequence value offset by one. Whenever the offset record set is NULL, we know we have a gap in the sequence. This step is coded in the “cte” common table expression below.

Once the sequence gaps have been identified, we can join the record set of gaps to the original record set of sequence numbers and add the range start value and range notation text to the range end (sequence gap) value. This is accomplished in the “cteRange” common table expression below. Note that we are grouping the records and determining those ranges that should be converted to range notation based on having a count > 2; in other words, ranges with three or more sequential values.

We’re almost there! Next we join the range data to the original record set of sequence numbers (again) and group the result based on range. In the event a sequence number is not part of a range (does not have a corresponding record in “cteRange”), it serves as its own range value. Finally, in order to convert this multi-record result into a single, comma-separated value, we use the FOR XML clause. (Haven’t done that before? Check here for more explanation, or here for another variation.) This entire process is demonstrated in the “put it all together” section of the code below.

-- create common table expressions to improve readability
WITH cte AS ( -- find range end points
    SELECT a.ID AS 'RangeEnd'
    FROM @tbl AS a LEFT OUTER JOIN @tbl AS b ON a.ID = b.ID - 1
    WHERE b.ID IS NULL)
, cteRange AS ( -- group values based on range end points
    SELECT MIN(t.ID) AS 'RangeStart'
        , oa.RangeEnd, CONVERT(VARCHAR
        , MIN(t.ID)) + '-' + CONVERT(VARCHAR, oa.RangeEnd) AS 'RangeText'
    FROM @tbl AS t
        OUTER APPLY (
            SELECT TOP 1 RangeEnd
            FROM cte
            WHERE cte.RangeEnd >= t.ID
            ORDER BY RangeEnd) AS oa
    GROUP BY oa.RangeEnd
    HAVING COUNT(*) > 2)

-- put it all together
SELECT STUFF(CONVERT(VARCHAR(MAX),
    (SELECT ',' + ISNULL(cteRange.RangeText, t.ID)
    FROM @tbl AS t
        LEFT OUTER JOIN cteRange
            ON t.ID >= cteRange.RangeStart
            AND t.ID <= cteRange.RangeEnd
    GROUP BY ISNULL(cteRange.RangeStart, t.ID), ISNULL(cteRange.RangeText, t.ID)
    ORDER BY ISNULL(cteRange.RangeStart, t.ID)
    FOR XML PATH('')))
    , 1, 1, '') AS 'RangeNotation';

Does anyone have a suggestion on how this can be accomplished more easily? Or any alternate approaches? Please share your feedback.

Anyone who is a regular reader of this blog is probably waiting for my reminder to clean-up your work area. However, since this example uses a table variable and does not materialize anything in the database… there is nothing to clean-up! So you won’t hear anything about clean-up from me. 😉

utfNumbers

A helpful function borrowed from Itzik Ben-Gan; a table of numbers comes in very handy, especially when it takes virtually no disk space.

-- Create a table-valued function
CREATE FUNCTION
 dbo.utfNumbers (
      @n AS BIGINT
)
RETURNS TABLE
AS
RETURN
WITH
      L0 AS (SELECT 1 AS c UNION ALL SELECT 1)
      , L1 AS (SELECT 1 AS c FROM L0 AS a, L0 AS b)
      , L2 AS (SELECT 1 AS c FROM L1 AS a, L1 AS b)
      , L3 AS (SELECT 1 AS c FROM L2 AS a, L2 AS b)
      , L4 AS (SELECT 1 AS c FROM L3 AS a, L3 AS b)
      , L5 AS (SELECT 1 AS c FROM L4 AS a, L4 AS b)
      , Nums AS (
            SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS n
            FROM L5)
      SELECT TOP(@n) n FROM Nums ORDER BY n
;

To use the function, simply select ‘n’ from the function as though it was a regular table, passing in as a parameter the total number of numbers you want returned. The example below will return 100 rows containing one column of values from one to 100.

-- Display content of the function
SELECT
n
FROM dbo.utfNumbers(100)
;