Tag Archives: XML

Shredding XML

There are multiple ways to shred XML within SQL Server, each with pros and cons. So which to choose? Barring rigid requirements about how the XML is sourced or other factors, why not choose based on performance? Surely no one would disagree with using the quickest method… so let’s explore which one is faster.

To start, let’s turn off record count display because we’ll use PRINT statements to see how long actions take.

-- turn off record count display to keep our output clean
SET NOCOUNT ON;

Next, let’s declare the variables we’ll use to both shred and measure execution times.

-- declare and initialize variables
DECLARE
    @time DATETIME2(3) -- stopwatch
    , @intXMLDocID INT -- xml document handle
    , @x XML = ( -- the XML content
            SELECT n AS id
            FROM [master].dbo.utfNumbers(100000) AS number
            FOR XML AUTO, TYPE, ROOT('count')
        )
    ;

Wait… what’s that utfNumbers in the FROM statement? You mean you don’t have one of those? Take a look here to see the value and roll your own.

Also, let’s go ahead and create a temporary table to hold the shredded XML. While we could simply return it to the client as a record set, it’s not necessary for performance measurement and it introduces an extraneous factor. With such a simple XML example, we only need a simple temp table.

-- create a temp table to hold the shredded XML
IF OBJECT_ID(N'tempdb..#tbl', N'U') IS NOT NULL DROP TABLE #tbl;
CREATE TABLE #tbl(ID INT);

For our first shredding attempt, let’s start the clock, use the XML nodes method, then stop the clock and see how long it took.

-- start the stopwatch
SET @time = GETDATE();

-- shred the xml back into a table
INSERT INTO #tbl(ID)
SELECT x.a.value('@id[1]', 'int') AS 'id'
FROM @x.nodes('/count/number') AS x(a);

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to shred via nodes';

While your results may vary, my DEV machine returned an execution time of around 800 milliseconds. Go ahead and change the record count to suit your interest or curiosity; try a million records or just a hundred.

With  the nodes method benchmarked, let’s try a different approach: the prepare XML document system stored procedure. For this approach we’ll actually time the action in three parts. First, how long does it take to prepare the XML document from the raw XML text; next, how long does it take to shred the XML into the temp table; and finally, how long does it take to remove the XML document (which is important, as you don’t want stray resources consuming your memory!).

-- clear the temp table
DELETE FROM #tbl;

-- start the clock
SET @time = GETDATE();

-- prepare the XML document
EXECUTE [master].dbo.sp_xml_preparedocument
    @hdoc = @intXMLDocID OUTPUT
    , @xmltext = @x
    ;

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to prepare XML doc';

-- start the clock
SET @time = GETDATE();

-- open the XML document
INSERT INTO #tbl(ID)
SELECT *
FROM OPENXML(@intXMLDocID, 'count/number')
WITH #tbl;

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to shred via OPENXML';

-- start the clock
SET @time = GETDATE();

-- remove the XML document from memory
EXECUTE [master].dbo.sp_xml_removedocument
    @hdoc = @intXMLDocID
    ;

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to remove XML doc';

Again, results will vary, but my DEV machine took about 350 milliseconds to prepare the XML document, 600 milliseconds to shred the XML, and zero milliseconds to remove the XML document. In other words, using the stored procedure took longer… but not really all that much. Now, that is not really unexpected, considering that parsing XML text into a full tree representation just “sounds” like a bigger job than reading the nodes.

But… ready to see something interesting? Try replacing the WITH #tbl syntax with the alternate signature WITH (ID INT). Check the execution time now and you’ll see that shredding the XML takes significantly longer.

While we only took a quick look at XML shredding, I hope we were able to demonstrate that using the right tool for the job is important, and that how you use the tool can be an even more important factor.

Does anyone have a use case where the XML tree was needed? Or do you have a favorite XML shredding technique that you use? If so — please reply and share!

Advertisements