Convert Rows to Column (Part II)

In a previous post we demonstrated how to use the FOR XML clause to concatenate multiple rows of one data field into a single row of one data field. While that technique is quite handy, what about when there are multiple fields to be concatenated? In other words, what if the native data has a one-to-many relationship but we need to return a one-to-one record set without losing any of the fields or data?

One possible solution is to simply extend the single-field example and create an additional CROSS APPLY block for each additional field. While not an elegant solution, it does work. Before we explore a more modular solution, let’s refresh our memories on what this first solution would look like.

Before we dig in, let’s create some objects for demonstration purposes.

-- Set the database context
USE [db];
GO

-- If the demo table already exists, drop it
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL
    DROP TABLE tbl;
GO

-- Create the demo table
CREATE TABLE tbl (
    ID INT IDENTITY(1,1) NOT NULL
    , FK INT NOT NULL
    , Value VARCHAR(20) NOT NULL
    , CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (ID)
    )
;
GO

-- Populate the demo table
INSERT INTO tbl (FK, Value)
VALUES
    (1, 'One')
    , (2, 'Two')
    , (2, 'Three')
;
GO

We’ll also make use of our numbers function to simulate the “one” side of our “one-to-many” data set. The number data will join to the foreign key (FK) we created in the demo table, which serves as our “many” side. Wait — you don’t have a “numbers” function? Get one here.

Okay, so with our environment ready, let’s demonstrate concatenation of multiple rows across multiple fields using FOR XML and CROSS APPLY.

-- First approach; effective but neither scalable nor elegant
SELECT
    nums.n
    , dTbl.ID
    , dTbl2.Value
FROM
    [master].dbo.utfNumbers(3) AS nums
    -- Join to concatenate the first field (ID)
    CROSS APPLY (
        SELECT CONVERT(VARCHAR(20), ID) + '; ' AS [text()]
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH('')
    ) AS dTbl(ID)
    -- Join to concatenate the second field (Value)
    CROSS APPLY (
        SELECT Value + '; ' AS [text()]
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH('')
    ) AS dTbl2(Value)
;

We are able to return the results we expect; however, there are two major issues. One, every additional field to return requires us to add an additional CROSS APPLY block; and two, every additional CROSS APPLY block causes an additional query/join to the data. If we could use just one CROSS APPLY block for all the columns, we could save both ourselves and the database engine some work.

And, of course, we can do just that. Again we will use the capabilities of XML to do the job. This time, rather than returning the concatenated result of a single field in the CROSS APPLY, we will return the entire record set (all rows and all columns) to the outer query in the form of a single XML value. With the XML value at the top query, we can then parse out the individual fields (nodes) in the outer SELECT statement.

Yes, that was a lot of words. But, I trust that they will make more sense once you see the example.

-- Second approach; a function makes it scalable
SELECT
    nums.n
    , 'IDs' = [master].dbo.ufnXMLConcat(dTbl.x, 'ID')
    , 'Values' = [master].dbo.ufnXMLConcat(dTbl.x, 'Value')
FROM
    [master].dbo.utfNumbers(3) AS nums
    -- Join to all of the child data, returned as XML
    CROSS APPLY (
        SELECT ID, Value
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH
    ) AS dTbl(x)
;

Obviously, this syntax is preferable. It is not only easier to author but it also requires fewer data touches for the SQL engine to resolve.

But wait… what is that ufnXMLConcat function? Why it is right here, of course! 😉

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnXMLConcat')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnXMLConcat;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnXMLConcat (
    @xml XML
    , @strNode VARCHAR(50)
    )
RETURNS VARCHAR(MAX)
AS
BEGIN
    -- Declare variables
    DECLARE
        @strReturn VARCHAR(MAX) = ''
        , @strXML VARCHAR(MAX) = CONVERT(VARCHAR(MAX), @xml)
    ;
    -- Return default if XML is null
    IF @strXML IS NULL RETURN @strReturn;
    -- Parse the XML string for the node
    WHILE CHARINDEX('<' + @strNode + '>', @strXML) != 0
    BEGIN
        -- Extract and concatenate the node value
        SET @strReturn += SUBSTRING(
                @strXML
                , CHARINDEX('<' + @strNode + '>', @strXML)
                    + LEN('<' + @strNode + '>')
                , CHARINDEX('</' + @strNode + '>', @strXML)
                    - CHARINDEX('<' + @strNode + '>', @strXML)
                    - LEN('<' + @strNode + '>')
                )
            + '; '
        ;
        -- Remove the node from the XML string
        SET @strXML = SUBSTRING(
                @strXML
                , CHARINDEX('</' + @strNode + '>', @strXML)
                    + LEN('</' + @strNode + '>')
                , LEN(@strXML)
                    - CHARINDEX('</' + @strNode + '>', @strXML)
                    - LEN('</' + @strNode + '>')
            )
        ;
    END
    -- Remove the trailing value delimiter
    SET @strReturn = LEFT(@strReturn, LEN(@strReturn) - 1);
    -- Return the result of the function
    RETURN @strReturn;
END
GO

As you can see, by accepting an XML value and a string node name as input, simple string functions take care of the rest. There are, of course, even more elegant ways to parse XML; but more complex programming is not necessarily permitted within SQL functions.

Although the second example provides a more elegant solution, it should be noted that even this solution does not scale perfectly. For example, if the outer query returns millions of rows and/or the inner query contains many rows or many columns, there could definitely be performance problems. After all, executing functions within a SELECT statement is asking for trouble when the record count gets high.

Does anyone have a more elegant way of handling this business case? If so, please share via a comment.

Special thanks to my colleague, Bryan Clayton, for his assistance in defining the business case and working through alternate methods to parse XML. Look forward to some of those alternatives in future posts!

As ever, don’t forget to clean up!

-- Drop the demo table
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL
    DROP TABLE tbl;

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • By Range Notation | T-SQL Ref on 07 May 2015 at 1300

    […] we use the FOR XML clause. (Haven’t done that before? Check here for more explanation, or here for another variation.) This entire process is demonstrated in the “put it all […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: