Category Archives: Business Cases

Get Precise Age

It seems a simple thing: determine the age of a person. Given his/her birth date, and the current date, the maths are quite obvious and a human does the calculation with ease. However… it is surprising that software doesn’t often ship with native functions adequate to the task. Want to see what I mean? ūüôā

Let’s start with the basic DATEDIFF function, using year¬†as the datepart argument.

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-07-31'
    , @currentdate DATE = '2016-01-01';
-- use datediff to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , DATEDIFF(YEAR, @birthdate, @currentdate) AS 'ageWithDateDiff';

Obviously this result does not satisfy the use case for which we employed the function. DATEDIFF returns the difference between the year values only, without any consideration of the month or day of either dates. The calculation indicates our person is 16 years old when, in fact, the person is 15 and has not yet reached his/her 16th birthday.

In response to this result I’ve often seen coded what I consider a brute-force approach to precision — maintaining the conceptual trajectory but applying it at a more granular level. Specifically, the DATEDIFF function is still employed but used with days instead of years, the result is divided by 365 (days per year) to get a fractional year value, and the result is rounded down to the nearest integer to get the age.

This approach works for the previous example:

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-07-31'
    , @currentdate DATE = '2016-01-01';
-- use days to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.0) AS 'ageWith365Days';

So far so good! But the approach does not work in every case, for example:

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-02-29'
    , @currentdate DATE = '2016-02-28';
-- use days to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.0) AS 'ageWith365Days';

As can be seen above, the “age calculation using days per year” method fails when a leap day is in play. But brute-force hasn’t given up just yet: let’s use 365.25 days per year to account for the leap day!

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-02-29'
    , @currentdate DATE = '2016-02-28';
-- use average days per year to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.25) AS 'ageWithLeapDays';

At this point it looks like the challenge¬†is solved. However… there is still a problem; for example:

-- declare and initialize variables
DECLARE @birthdate DATE = '2001-02-28'
    , @currentdate DATE = '2016-02-28';
-- use average days per year to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.25) AS 'ageWithLeapDays';

The “365.25” days method calculates that the person is 14. In this case, however, the current date is actually the person’s 15th birthday. In other words, the calculation fails.

Is there anything else for the brute-force approach to try? Instead… let’s attempt to code the underlying rules for calculating a person’s age (or any event anniversary).

Subtracting the years is actually a very good place to start because, if the “current” month-day is on-or-after the “birth” (event) month-day, then the raw year calculation is correct. And if the “birth” month-day is before the “current” month-day, then we subtract one from the raw year calculation.

As it turns out, this comparison of month-days is quite easy; we can simply use the¬†DATEDIFF function with the DAYOFYEAR datepart argument! Of course, we still need to accommodate leap days. If we don’t consider leap days, then we will¬†find that 59th day of the year is “normally” 01-Mar, but it’s 29-Feb on a leap year.

The good news is that accounting for leap days¬†is not as complicated as it might at first seem. We can still use DAYOFYEAR as the DATEDIFF argument, but decrement the result when dealing with a leap year on either the event or current date. (Note that we don’t need to worry about it if both event and current dates are in a leap year or neither are in a leap year — in those cases, the years have the same number of days so a straight-compare works correctly.)

CREATE FUNCTION dbo.ufnGetYearDiff (@start DATE, @end DATE)
RETURNS SMALLINT
AS
BEGIN
    DECLARE @return INT
        , @s SMALLINT = DATEPART(DAYOFYEAR, @start)
        , @e SMALLINT = DATEPART(DAYOFYEAR, @end);
    -- if necessary, decrement the start date dayofyear
    IF YEAR(@start) % 4 = 0 AND YEAR(@end) % 4 != 0 AND @s > 59 SET @s -= 1;
    -- if necessary, decrement the end date dayofyear
    IF YEAR(@start) % 4 != 0 AND YEAR(@end) % 4 = 0 AND @e > 59 SET @e -= 1;
    -- decrement the year diff if the end is earlier in the year than the start
    SET @return = DATEDIFF(YEAR, @start, @end)
        - CASE WHEN @e < @s THEN 1 ELSE 0 END;
    RETURN @return;
END
GO

Let’s test it out and see if we have solved the problem.

-- declare and initialize variables
DECLARE @birthdate DATE = '2001-02-28'
    , @currentdate DATE = '2016-02-28';
-- use the custom function to calculate the age
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , dbo.ufnGetYearDiff(@birthdate, @currentdate) AS 'ageFn';

Success! Does anyone have a more elegant way to solve this challenge?

PS: Before anyone yells at me, the BOL documentation for DATEDIFF very clearly states what the function does: it returns the number of specified boundaries crossed between the startdate and enddate. Nonetheless, the function doesn’t do what the name implies (or, more precisely, what many people assume).

Range Notation

Business Case: A record set consists of sequential values that must be represented in standard range notation. For example, values include 1, 2, 3, 5, 6, 8, 9, 10, 11, 14, 15, and 16; these values should be represented as a single, comma-separated string with sequential values compressed into ranges, as follows: “1-3,5,6,8-11,14-16”.

Environment: There are no environmental factors with a relevant influence on this business case.

Approach: We’ll attempt to meet this requirement in a series of steps that build upon each other. First, we’ll identify sequence gaps in the record set. With that information we’ll determine how many values are in each range. If there are more than two sequential values, we’ll compress the range into range notation indicating the first value and the last value in the range. Finally, we’ll incorporate the single-values and the ranges and convert them into a comma-separate string using the FOR XML clause.

Solution: Let’s see the solution in action, starting with creation of our sample data set.

-- setup sample data
DECLARE @tbl TABLE (ID INT);
INSERT INTO @tbl (ID)
VALUES (1), (2), (3)
    , (5), (6)
    , (8), (9), (10), (11)
    , (14), (15), (16);

With the same data set created, we’ll define the pieces necessary to reach the desired output. For this example, we’ll use common table expressions to enhance readability of the solution. Note that it can be done without CTEs, but it would be more difficult to follow due to the nested sub-queries.

Our initial step is to identify the gaps in sequencing; we’ll do this by joining the data set to itself, with the sequence value offset by one. Whenever the offset record set is NULL, we know we have a gap in the sequence. This step is coded in the “cte” common table expression below.

Once the sequence gaps have been identified, we can join the record set of gaps to the original record¬†set of sequence numbers and add the range start value and range notation text to the range end (sequence gap) value. This is accomplished in the “cteRange” common table expression below. Note that we are grouping the records and determining those ranges that should be converted to range notation based on having a count > 2; in other words, ranges with three or more sequential values.

We’re almost there! Next we join the range data to the original record set of sequence numbers (again) and group the result based on range. In the event a sequence number is not part of a range (does not have a corresponding record in “cteRange”), it serves as its own range value. Finally, in order to convert this multi-record result into a single, comma-separated value, we use the FOR XML clause. (Haven’t done that before? Check here for more explanation, or here for another variation.) This entire process is demonstrated in the “put it all together” section of the code below.

-- create common table expressions to improve readability
WITH cte AS ( -- find range end points
    SELECT a.ID AS 'RangeEnd'
    FROM @tbl AS a LEFT OUTER JOIN @tbl AS b ON a.ID = b.ID - 1
    WHERE b.ID IS NULL)
, cteRange AS ( -- group values based on range end points
    SELECT MIN(t.ID) AS 'RangeStart'
        , oa.RangeEnd, CONVERT(VARCHAR
        , MIN(t.ID)) + '-' + CONVERT(VARCHAR, oa.RangeEnd) AS 'RangeText'
    FROM @tbl AS t
        OUTER APPLY (
            SELECT TOP 1 RangeEnd
            FROM cte
            WHERE cte.RangeEnd >= t.ID
            ORDER BY RangeEnd) AS oa
    GROUP BY oa.RangeEnd
    HAVING COUNT(*) > 2)

-- put it all together
SELECT STUFF(CONVERT(VARCHAR(MAX),
    (SELECT ',' + ISNULL(cteRange.RangeText, t.ID)
    FROM @tbl AS t
        LEFT OUTER JOIN cteRange
            ON t.ID >= cteRange.RangeStart
            AND t.ID <= cteRange.RangeEnd
    GROUP BY ISNULL(cteRange.RangeStart, t.ID), ISNULL(cteRange.RangeText, t.ID)
    ORDER BY ISNULL(cteRange.RangeStart, t.ID)
    FOR XML PATH('')))
    , 1, 1, '') AS 'RangeNotation';

Does anyone have a suggestion on how this can be accomplished more easily? Or any alternate approaches? Please share your feedback.

Anyone who is a regular reader of this blog is probably waiting for my reminder to clean-up your work area. However, since this example uses a table variable and does not materialize anything in the database… there is nothing to clean-up! So you won’t hear anything about clean-up from me. ūüėČ

Decoupling a Trigger

Table-level triggers are generally used to enforce business rules based on data manipulation language (DML) statements. Among the reasons they are so good at this function is that they are implicitly wrapped into the DML transaction. Simply put: if the trigger fails the DML fails.

In many cases, that is the desired behavior. However, what if you have an optional business rule? In other words, you’d like to take secondary action on certain data events, but, if the secondary action were to fail, you still want to keep the data change. A silly example of this could be if you wanted to send¬†an email anytime someone named “Bob” registered at your website. While the email is desired, if it could not be sent for some reason, you’d still want “Bob” to be able to register. A trigger is a good candidate for this requirement¬†because we want to be directly tied to the DML event of “Bob” registering… but we somehow need to decouple the trigger action from the DML transaction.

One way to do this is to use an execution context command that allows you to specify that an error should not abort the transaction. This setting is defaulted to “off” in normal query execution… but with triggers the default is “on”. Thus, simply setting the configuration within the trigger will do the job.

One caveat, however: even with the setting “off”, if an error within the trigger is sufficiently severe, the DML transaction will fail and rollback. Nonetheless, more innocuous errors can occur without causing DML failure.

Let’s take a look at this behavior in action.

-- if the example table exists, drop it
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL DROP TABLE tbl;
GO

-- create the example table
CREATE TABLE tbl (
    ID INT IDENTITY(1,1) NOT NULL
    , Value VARCHAR(20) NOT NULL
    , CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (ID)
    );
GO

-- create a trigger on the example table
CREATE TRIGGER itrgTbl
ON tbl
FOR INSERT
AS
BEGIN
    -- turn off transaction aborting
    SET XACT_ABORT OFF;
    BEGIN TRY
        PRINT 'TRY';
        -- cause a failure
        DECLARE @i INT = 1/0;
    END TRY
    BEGIN CATCH
        PRINT 'CATCH!';
        -- capture the error values
        DECLARE
            @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE()
            , @ErrorSeverity INT = ERROR_SEVERITY()
            , @ErrorState INT = ERROR_STATE()
            ;
        -- raise the error
        IF @ErrorState > 0
            RAISERROR (
                @ErrorMessage
                , @ErrorSeverity
                , @ErrorState
                );
    END CATCH
END
GO

-- test the trigger by inserting a record
INSERT INTO tbl (Value) VALUES ('Hello, world.');

-- check the table contents to determine if the insert was successful
SELECT * FROM tbl;

When executing the code above you will see that the divide by zero error is raised, but the record was committed to the table anyway. Try commenting out the SET XACT_ABORT OFF command and running it again. Without that command the division error is escalated to a trigger execution error and the DML is blocked with message 3616:
Msg 3616, Level 16, State 1, Line 2
An error was raised during trigger execution. The batch has been aborted and the user transaction, if any, has been rolled back.

If you need to isolate your trigger code even more robustly from the DML event, you can also go fully asynchronous by using the trigger to send a message to the service broker. But more on that in a future post!

For more info on XACT_ABORT, read all about it on technet.

Temp Table Scope, Part 2

In a previous post we discussed the scope of temporary tables; specifically, their persistence inside and outside of dynamic SQL statements within the same connection. In that post we examined using a global temporary table to resolve the persistence issue. In this post, we’ll look at an alternate approach: creating a basic table first and then extending it through dynamic SQL. Let’s get started!

As you’ll recall, we previously demonstrated that temporary tables created within dynamic SQL were not available outside of the dynamic SQL execution. However, a temporary table created within a connection is available within a dynamic SQL context. The following script confirms this:

-- If the temp table we intend to create already exists, drop it
IF OBJECT_ID(N'tempdb..#tbl') IS NOT NULL
    DROP TABLE #tbl;

-- Create a basic temp table
CREATE TABLE #tbl(ID INT IDENTITY(1,1) NOT NULL);

-- Confirm the temp table was created
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

-- Confirm the temp table is accessible within dynamic SQL
DECLARE @SQL NVARCHAR(2000) =
    N'SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..#tbl'');';
EXECUTE (@SQL);

You will notice that the object id of the temporary table is the same both outside and inside the dynamic SQL. In other words, the temporary table is persistent across the connection, unlike when a temporary table is created within dynamic SQL.

Now that we’ve seen the temporary table persists into dynamic SQL, we can extend the table within the dynamic SQL statement.

/* Extend the temp table via dynamic SQL
In this instance we'll add a parameterized number of columns */
DECLARE
    @i TINYINT = 0
    , @j TINYINT = 3;

WHILE @i < @j
BEGIN
    SET @i += 1;
    SET @SQL = N'ALTER TABLE #tbl ADD Col'
        + CONVERT(NVARCHAR(10), @i)
        + N' VARCHAR(50) NULL;';
    EXECUTE (@SQL);
END

-- Confirm that the table was extended outside the dynamic SQL
SELECT * FROM #tbl;

As the query result shows, the columns were successfully added to the temporary table. And, what’s more, the temporary table is available both inside and outside of a dynamic SQL context.¬†Bob’s your uncle.

Test Compatibility Level

Business Case: An existing application is built on a legacy version of the SQL Server database engine (such as SQL Server 2000). To take advantage of modern capabilities, the application is to be upgraded to a more recent version of the database engine.

Environment: The application is business-critical and unscheduled downtime is not acceptable. The application must be tested to confirm reliability at the higher database version, but full-function application testing is unrealistic due to the complexity of the application.

Approach: To test the application database layer, we will create a script to collect and parse the existing SQL statements under the higher database version, capturing any errors that might be generated.

Solution: To demonstrate the approach, we will simply write and test a few SQL statements. To fully-develop this solution, rather than run the test against our sample SQL table, we could run it against a filtered selection of sys.sql_modules — a system view that contains the SQL definition statement of virtually all objects in the database.

First, let’s set the database context to a database at the target version (compatibility level). We’ll also suppress statement record count reports.

-- Set the database context to a db at the target version
USE [db];
GO

-- Do not show statement record counts
SET NOCOUNT ON;

For demonstration purposes, we’ll create a table variable to store the SQL statements to be tested. In the real-world, a query from the [definition] field of sys.sql_modules would be a good place to start. In this example we’ll test three statements that are valid in SQL Server 2000 but invalid in SQL Server 2008.

-- Create the SQL input table
DECLARE @tblIn TABLE (
    ID INT IDENTITY(1,1) NOT NULL
    , SQLString NVARCHAR(MAX) NOT NULL
    )
;

-- Populate the SQL input table
INSERT INTO @tblIn (SQLString)
VALUES -- these statements are valid at level 80, invalid at level 100
    (N'SET')
    , (N'SELECT o.name FROM sys.objects AS o HOLDLOCK;')
    , (N'SELECT o.name FROM sys.objects AS o, sys.tables AS t WHERE o.[object_id] *= t.[object_id];')
;

Now we’ll create an output table to store the results of our syntax check. We’ll capture the details of our input as well as the level at which we are testing the statement and any errors that might result.

-- Create the SQL output table
DECLARE @tblOut TABLE (
    ID INT NOT NULL
    , SQLString NVARCHAR(MAX) NOT NULL
    , ValidationLevel TINYINT NOT NULL
    , ErrorNumber INT NOT NULL DEFAULT(-1)
    , ErrorMessage NVARCHAR(2000) NOT NULL DEFAULT(N'')
    )
;
GO

Next we’ll begin the automated testing phase. We’ll create a cursor to scroll through the SQL statements to be tested. Start by creating variables to hold relevant information for the cursor, then initialize the cursor definition and iteration process.

-- Declare the cursor output variable(s)
DECLARE
    @intID INT
    , @nstrSQLOrig NVARCHAR(MAX)
    , @nstrSQL NVARCHAR(MAX)
    , @intCompatLevel TINYINT = (
            SELECT [compatibility_level]
            FROM sys.databases
            WHERE database_id = DB_ID()
        )
;

-- Declare and initialize the cursor
DECLARE crs CURSOR FAST_FORWARD
FOR
    -- Define the cursor recordset
    SELECT ID, SQLString
    FROM @tblIn
;

-- Open the cursor
OPEN crs;

-- Initialize the cursor output
FETCH NEXT FROM crs INTO @intID, @nstrSQL;

With the cursor now open, let’s iterate through the records. For each SQL statement to be tested, we’ll first capture the original SQL statement. Then we’ll modify the statement by prepending turning the PARSEONLY setting on. That setting tells SQL Server to check the syntax of the statement, but do not compile it and do not execute it. If, for your situation, you’d like to test statement compilation as well, try setting¬†NOEXEC¬†on. Once parsed, we’ll turn PARSEONLY off so that further statements are executed. And, by wrapping the whole thing in a try-catch block and within a transaction, we can control the flow and error capture to meet our needs.

-- Iterate through the cursor records
WHILE @@FETCH_STATUS = 0
BEGIN
    -- Capture the original SQL statement
    SET @nstrSQLOrig = @nstrSQL
    -- Attempt to parse
    BEGIN TRY
        -- Start transaction
        BEGIN TRANSACTION
            -- Prepend statement with parseonly setting 
            SET @nstrSQL = N'SET PARSEONLY ON; ' + @nstrSQL;
            EXECUTE (@nstrSQL);
            -- Turn off parseonly
            SET PARSEONLY OFF;
            -- Capture statements and validation level
            INSERT INTO @tblOut (ID, SQLString, ValidationLevel)
            VALUES (
                @intID
                , @nstrSQLOrig
                , @intCompatLevel
            ;
        -- Commit transaction
        COMMIT TRANSACTION;
    END TRY
    -- Parse failed; capture error
    BEGIN CATCH
        -- Roll back the transaction
        ROLLBACK TRANSACTION;
        -- Capture statement and error information
        INSERT INTO @tblOut (
            ID
            , SQLString
            , ValidationLevel
            , ErrorNumber
            , ErrorMessage
            )
        VALUES (
            @intID
            , @nstrSQLOrig
            , @intCompatLevel
            , ERROR_NUMBER()
            , ERROR_MESSAGE()
            )
        ;
    END CATCH
    -- Iterate the cursor output
    FETCH NEXT FROM crs INTO @intID, @nstrSQL;
END

Finally, let’s close and deallocate the curser and turn statement counts back on.

-- Close cursor
CLOSE crs;

-- Release cursor from memory
DEALLOCATE crs;

-- Show statement counts
SET NOCOUNT OFF;

What about the results? Well, let’s take a look!

-- View validation report
SELECT * FROM @tblOut;

If the preceding script is run under SQL Server 2000 (compatibility level 80), you’ll see the following output.

Test SQL at level 80

Valid syntax for SQL 2000

And, if run under SQL Server 2008 (compatibility level 100), you’ll see the following.

Test SQL at level 100

Invalid syntax for SQL 2008

Does anyone have any suggestions on a more elegant way to accomplish this type of automated compatibility testing?

Convert Rows to Column (Part II)

In a previous post we demonstrated how to use the FOR XML clause to concatenate multiple rows of one data field into a single row of one data field. While that technique is quite handy, what about when there are multiple fields to be concatenated? In other words, what if the native data has a one-to-many relationship but we need to return a one-to-one record set without losing any of the fields or data?

One possible solution is to simply extend the single-field example and create an additional CROSS APPLY block for each additional field. While not an elegant solution, it does work. Before we explore a more modular solution, let’s refresh our memories on what this first solution would look like.

Before we dig in, let’s create some objects for demonstration purposes.

-- Set the database context
USE [db];
GO

-- If the demo table already exists, drop it
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL
    DROP TABLE tbl;
GO

-- Create the demo table
CREATE TABLE tbl (
    ID INT IDENTITY(1,1) NOT NULL
    , FK INT NOT NULL
    , Value VARCHAR(20) NOT NULL
    , CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (ID)
    )
;
GO

-- Populate the demo table
INSERT INTO tbl (FK, Value)
VALUES
    (1, 'One')
    , (2, 'Two')
    , (2, 'Three')
;
GO

We’ll also make use of our numbers function to simulate the “one” side of our “one-to-many” data set. The number data will join to the foreign key (FK) we created in the demo table, which serves as our “many” side. Wait — you don’t have a “numbers” function? Get one here.

Okay, so with our environment ready, let’s demonstrate concatenation of multiple rows across multiple fields using FOR XML and CROSS APPLY.

-- First approach; effective but neither scalable nor elegant
SELECT
    nums.n
    , dTbl.ID
    , dTbl2.Value
FROM
    [master].dbo.utfNumbers(3) AS nums
    -- Join to concatenate the first field (ID)
    CROSS APPLY (
        SELECT CONVERT(VARCHAR(20), ID) + '; ' AS [text()]
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH('')
    ) AS dTbl(ID)
    -- Join to concatenate the second field (Value)
    CROSS APPLY (
        SELECT Value + '; ' AS [text()]
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH('')
    ) AS dTbl2(Value)
;

We are able to return the results we expect; however, there are two major issues. One, every additional field to return requires us to add an additional CROSS APPLY block; and two, every additional CROSS APPLY block causes an additional query/join to the data. If we could use just one CROSS APPLY block for all the columns, we could save both ourselves and the database engine some work.

And, of course, we can do just that. Again we will use the capabilities of XML to do the job. This time, rather than returning the concatenated result of a single field in the CROSS APPLY, we will return the entire record set (all rows and all columns) to the outer query in the form of a single XML value. With the XML value at the top query, we can then parse out the individual fields (nodes) in the outer SELECT statement.

Yes, that was a lot of words. But, I trust that they will make more sense once you see the example.

-- Second approach; a function makes it scalable
SELECT
    nums.n
    , 'IDs' = [master].dbo.ufnXMLConcat(dTbl.x, 'ID')
    , 'Values' = [master].dbo.ufnXMLConcat(dTbl.x, 'Value')
FROM
    [master].dbo.utfNumbers(3) AS nums
    -- Join to all of the child data, returned as XML
    CROSS APPLY (
        SELECT ID, Value
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH
    ) AS dTbl(x)
;

Obviously, this syntax is preferable. It is not only easier to author but it also requires fewer data touches for the SQL engine to resolve.

But wait… what is that ufnXMLConcat function? Why it is right here, of course! ūüėČ

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnXMLConcat')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnXMLConcat;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnXMLConcat (
    @xml XML
    , @strNode VARCHAR(50)
    )
RETURNS VARCHAR(MAX)
AS
BEGIN
    -- Declare variables
    DECLARE
        @strReturn VARCHAR(MAX) = ''
        , @strXML VARCHAR(MAX) = CONVERT(VARCHAR(MAX), @xml)
    ;
    -- Return default if XML is null
    IF @strXML IS NULL RETURN @strReturn;
    -- Parse the XML string for the node
    WHILE CHARINDEX('<' + @strNode + '>', @strXML) != 0
    BEGIN
        -- Extract and concatenate the node value
        SET @strReturn += SUBSTRING(
                @strXML
                , CHARINDEX('<' + @strNode + '>', @strXML)
                    + LEN('<' + @strNode + '>')
                , CHARINDEX('</' + @strNode + '>', @strXML)
                    - CHARINDEX('<' + @strNode + '>', @strXML)
                    - LEN('<' + @strNode + '>')
                )
            + '; '
        ;
        -- Remove the node from the XML string
        SET @strXML = SUBSTRING(
                @strXML
                , CHARINDEX('</' + @strNode + '>', @strXML)
                    + LEN('</' + @strNode + '>')
                , LEN(@strXML)
                    - CHARINDEX('</' + @strNode + '>', @strXML)
                    - LEN('</' + @strNode + '>')
            )
        ;
    END
    -- Remove the trailing value delimiter
    SET @strReturn = LEFT(@strReturn, LEN(@strReturn) - 1);
    -- Return the result of the function
    RETURN @strReturn;
END
GO

As you can see, by accepting an XML value and a string node name as input, simple string functions take care of the rest. There are, of course, even more elegant ways to parse XML; but more complex programming is not necessarily permitted within SQL functions.

Although the second example provides a more elegant solution, it should be noted that even this solution does not scale perfectly. For example, if the outer query returns millions of rows and/or the inner query contains many rows or many columns, there could definitely be performance problems. After all, executing functions within a SELECT statement is asking for trouble when the record count gets high.

Does anyone have a more elegant way of handling this business case? If so, please share via a comment.

Special thanks to my colleague, Bryan Clayton, for his assistance in defining the business case and working through alternate methods to parse XML. Look forward to some of those alternatives in future posts!

As ever, don’t forget to clean up!

-- Drop the demo table
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL
    DROP TABLE tbl;

Temp Table Scope

Temporary table scope is actually a little trickier than it might seem. First of all, Microsoft guidance is not particularly clear. Regarding local temp tables, BOL states: “[T]hey are visible only to the current connection for the user, and they are deleted when the user disconnects from the instance of SQL Server.”

Well, that’s not entirely accurate. “Connection” is misleading. Take a look at the following to see what I mean.

In the first run-through we’ll create a temp table the “normal” way; as you’ll see, everything works as expected.

-- If the temp table we intend to create already exists, drop it
IF OBJECT_ID(N'tempdb..#tbl') IS NOT NULL
    DROP TABLE #tbl;

-- Create a temp table
CREATE TABLE #tbl (ID INT);

-- Confirm the temp table was created
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

-- Drop the temp table we created
IF OBJECT_ID(N'tempdb..#tbl') IS NOT NULL
    DROP TABLE #tbl;

-- Confirm the temp table was dropped
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

So far so good. Now, let’s try the same thing again, but this time we’ll run the SQL from within an EXECUTE command.

-- Declare variables
DECLARE @nstrSQL NVARCHAR(MAX);

-- SQL to create the temp table and confirm it exists
SET @nstrSQL = N'CREATE TABLE #tbl (ID INT);
SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..#tbl'');'
;

-- Create the temp table and confirm creation
EXECUTE (@nstrSQL);

-- See if the temp table still exists
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

What?! Within the EXECUTE statement context, the temp table was created and we confirmed that it existed in that context. However, outside the scope of the EXECUTE statement, the table is gone. Yet we are still in the same connection — which you can confirm by looking at the process id inside and outside of the EXECUTE statement context (simply add @@SPID to the SELECT statements).

Thankfully, the global temp table resolves that issue, so it’s not too much of an inconvenience. Except…

With a global temp table you have to be a little more careful about execution process overlap. In other words, if there is the potential for multiple processes to run the same procedure at close to the same time, and that procedure creates a temporary table of the same name, then the processes could collide and one process might use the table while the other process is still using it, potentially causing data corruption, or worse.

Well, that’s easy enough to mitigate as well. We just need to make the temp table name effectively unique such that each process will have a temp table of a different name. And that is easy to do with a GUID. When we create the global temp table, we’ll simply add a GUID to the name. That is sufficient to solve the unique name problem, but it does add the complication of having to share the GUID with any other process that might need to use the temp table. But, if a temp table is necessary, then passing an additional variable for the table name is likely a small price to pay to ensure data quality.

-- Declare variables
DECLARE
    @nstrGUID NCHAR(36) = NEWID()
    , @nstrSQL NVARCHAR(MAX)
;

-- SQL to create the temp table and confirm it exists
SET @nstrSQL = N'CREATE TABLE [##tbl_' + @nstrGUID + '] (ID INT);
SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..[##tbl_' + @nstrGUID + ']'');'
;

-- Create the temp table and confirm creation
EXECUTE (@nstrSQL);

-- SQL to re-confirm the temp table exists
SET @nstrSQL = N'SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..[##tbl_'
    + @nstrGUID + ']'');'
;

-- See if the temp table still exists
EXECUTE (@nstrSQL);

Problem solved.

Asynchronous Transactions

Business Case: An organization frequently creates databases ad-hoc and wants them to be backed-up immediately upon creation. This reduces risk by automating what would otherwise be another step for the database creator to remember to perform. Additionally, the existing maintenance actions include a full backup daily and a log backup every fifteen minutes. The challenge is that a log backup will fail if a full backup has not already been performed on the database. Given the frequency and ad-hoc nature of database creation, it is important to ensure a full backup is taken as soon as a database is created.

Environment: On the surface, this business case seems like a good match for a server-level trigger on the CREATE_DATABASE statement, with the trigger action performing the backup. However, that will not work because a backup operation is not permitted within a transaction containing other statements; triggers are executed as part of the calling statement (because the calling transaction must be rolled-back if the trigger action fails). Thus, we will need to be able to perform a second action as reliably as if it were part of a database transaction, but it cannot actually exist within the same literal transaction.

Approach: To perform this sort of asynchronous transaction, we will use event notification and the service broker to perform a full backup whenever a database is created.

Solution: Let us start by creating a stored procedure that will read a yet-to-be-defined service broker queue whenever a message is placed in it. The procedure will consume the message, perform the database backup, and then end the service broker conversation.

You’ll notice that this stored procedure makes use of a procedure we created previously to perform our backup operations. It is available here.

Also note that this yet-to-be-defined queue will be created in the msdb database. We are creating it there because it roughly aligns with part of what msdb is intended to do (manage the SQL agent); also, because msdb is a system database, it is conceptually independent from the user databases for which this solution is intended to manage initial backup creation.

-- Create a procedure to consume service broker queued messages
CREATE PROCEDURE [dbo].[uspBrokerBackup]
AS
BEGIN
    -- Initialize variables for later use
    DECLARE
        @uiHandle UNIQUEIDENTIFIER
        , @strMsg NVARCHAR(MAX)
        , @sysMsgName SYSNAME
        , @sysDBName SYSNAME
    ;
    -- Consume a message from the queue, waiting no longer than one second
    WAITFOR (
        RECEIVE TOP(1)
            @uiHandle = [conversation_handle]
            , @strMsg = message_body
            , @sysMsgName = message_type_name
        FROM msdb.dbo.queBackupDB)
        , TIMEOUT 1000 -- in milliseconds
    ;
    -- Validate message type and contents
    IF @sysMsgName = N'http://schemas.microsoft.com/SQL/Notifications/EventNotification'
    AND CHARINDEX(N'<DatabaseName>', @strMsg) != 0
    AND CHARINDEX(N'</DatabaseName>', @strMsg) != 0
    BEGIN
        -- Extract the database name from the message body
        SET @sysDBName = SUBSTRING(
            @strMsg
            , CHARINDEX(N'<DatabaseName>', @strMsg)
                + LEN(N'<DatabaseName>')
            , CHARINDEX(N'</DatabaseName>', @strMsg)
                - CHARINDEX(N'<DatabaseName>', @strMsg)
                - LEN(N'<DatabaseName>'))
        ;
        -- Perform the full backup; click here for uspMxBackup
        EXECUTE master.dbo.uspMxBackup
            @strBackupType = 'FULL'
            , @sysServerName = NULL
            , @sysDatabaseName = @sysDBName
        ;
        -- End the service broker conversation
        END CONVERSATION @uiHandle;
    END
END

Now let’s setup the service broker. We only need to create a couple of the service broker objects; we’ll use some system objects for the rest. We’ll need a queue to store and process notifications of database creation, a service to receive notifications and put them in the queue, and an event notification to catch the create database actions and send them to the service. Don’t worry: it only sounds easy because it is. ūüôā

-- If the service already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.services
    WHERE name = N'//SQLEXPRESS/InternalAction/BackupDB')
DROP SERVICE [//SQLEXPRESS/InternalAction/BackupDB]
;

-- If the queue already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.service_queues
    WHERE name = N'queBackupDB')
DROP QUEUE [queBackupDB]
;

-- Create the queue to store and process the notifications
CREATE QUEUE [queBackupDB]
WITH
    ACTIVATION (
        PROCEDURE_NAME = [master].[dbo].[uspBrokerBackup]
        , MAX_QUEUE_READERS = 1
        , EXECUTE AS N'dbo')
;

-- Create the service to send notifications to the queue
CREATE SERVICE [//SQLEXPRESS/InternalAction/BackupDB]
ON QUEUE queBackupDB ([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification])
;

-- If the event notification already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.server_event_notifications
    WHERE name = N'evntCreateDatabase')
DROP EVENT NOTIFICATION evntCreateDatabase
ON SERVER
;

-- Get the msdb service broker id
DECLARE @uiBrokerID UNIQUEIDENTIFIER = (
    SELECT service_broker_guid
    FROM sys.databases
    WHERE name = N'msdb')
;

-- Build the SQL to create the event notification
DECLARE @strSQL NVARCHAR(MAX) =
N'CREATE EVENT NOTIFICATION evntCreateDatabase
ON SERVER
FOR CREATE_DATABASE
TO SERVICE
    ''//SQLEXPRESS/InternalAction/BackupDB'' -- name of service broker service
    , ''' + CONVERT(CHAR(36), @uiBrokerID) + N''' -- service_broker_guid for msdb db
;'
;

-- Create the event notification
EXECUTE sp_executesql @strSQL;

And, there you have it. When the create database event occurs, a notification message is sent via a service to a queue, which then uses a stored procedure to process each notification message. If useful in your environment, feel free to add defensive coding, such as ensuring notification messages are processed successfully, retrying if they are not, notifying an operator if any piece fails, etc. Enjoy!

Cursors on the Fly

In general, I don’t like cursors. My experience has been that virtually any business logic can be implemented through joining relational data sets, especially after the introduction of the CROSS APPLY join type in SQL Server 2005. Having said that, there are a few specific instances when a cursor could be handy. One such case is when individual actions must be performed on records from a sub query, but the sub query definition is not known prior to run time. In that instance, creating a cursor definition on-the-fly can be a useful technique.

Let’s take a closer look.

Business Case: An application makes use of databases in a hierarchical manner such that a single, top-level database contains connection information to multiple, second-level databases. A use case exists such that select information from the top-level database must be used to identify which second-level databases are to be queried.

Environment: The complication with this request is that the specific information from the top-level database is not known prior to run time, thus the specific names of the second-level databases are also not known. Because the second-level names are not known, traditional join or cursor syntax will not work as parameters for variable substitution are not allowed.

Approach: We will use the API cursor definition procedures to create a cursor on-the-fly based on information queried during run time.

Demonstration Preparation: For demonstration purposes, first create two databases on your server in your default configuration. The first database can be called DB and the second can be called DB2. We do not need any special configuration on the databases; whatever you have as your server default will do. The first database (DB) will simulate our top-level database as well as one of the second-level databases. The second database (DB2) will simulate another second-level database. We will create a parent table in the top-level database called tblOne; we will create a second table called tblTwo in the top-level database and as well as in the second database.¬†For this example we will be working within the “dbo” schema.

-- If our parent table already exists, drop it
IF OBJECT_ID(N'DB.dbo.tblOne') IS NOT NULL
    DROP TABLE DB.dbo.tblOne
;

-- Create the parent table
CREATE TABLE DB.dbo.tblOne (
    ID INT NOT NULL
    , DatabaseName VARCHAR(10) NOT NULL
    , MinIDToReturn TINYINT)
;

-- Populate the parent table
INSERT INTO DB.dbo.tblOne
VALUES
    (1, N'DB', 1)
    , (2, N'DB2', 2)
;

-- If the first secondary table already exists, drop it
IF OBJECT_ID(N'DB.dbo.tblTwo') IS NOT NULL
    DROP TABLE DB.dbo.tblTwo
;

-- Create the first secondary table
CREATE TABLE DB.dbo.tblTwo (
    ID INT NOT NULL
    , Value VARCHAR(10) NOT NULL)
;

-- Populate the first secondary table
INSERT INTO DB.dbo.tblTwo
VALUES
    (1, N'DB - 1')
    , (2, N'DB - 2')
    , (3, N'DB - 3')
;

-- If the second secondary table already exists, drop it
IF OBJECT_ID(N'DB2.dbo.tblTwo') IS NOT NULL
    DROP TABLE DB2.dbo.tblTwo
;

-- Create the second secondary table
CREATE TABLE DB2.dbo.tblTwo (
    ID INT NOT NULL
    , Value VARCHAR(10) NOT NULL)
;

-- Populate the second secondary table
INSERT INTO DB2.dbo.tblTwo
VALUES
    (1, N'DB2 - 4')
    , (2, N'DB2 - 5')
    , (3, N'DB2 - 6')
    , (4, N'DB2 - 7')
;

Solution: We will start by creating a cursor via the normal TSQL commands to return the connection (database) and return parameter from the top-level database. Then we will use a few of the cursor API stored procedures to create child cursors on-the-fly.

-- Declare variables
DECLARE
    @strDBName VARCHAR(10) -- one variable from the primary cursor
    , @intMinID TINYINT -- second variable from the primary cursor
    , @strSQL VARCHAR(255) -- string for child cursor definition
    , @crs2 INT -- cursor handle for child cursor
    , @intRowCount INT -- placeholder for child cursor creation
;
DECLARE @tbl TABLE ( -- storage for the child cursor
    ID INT
    , Value VARCHAR (10))
;

-- Declare the primary cursor
DECLARE crs CURSOR READ_ONLY
FOR
    SELECT
        DatabaseName
        , MinIDToReturn
    FROM DB.dbo.tblOne
;

-- Open the primary cursor
OPEN crs;

-- Prepare the primary cursor for iteration
FETCH NEXT FROM crs INTO @strDBName, @intMinID;

-- Scroll through the primary cursor records
WHILE @@FETCH_STATUS = 0
BEGIN

    -- Set the dynamic cursor record set
    --   This is the point of an on-the-fly cursor: creating a
    --   definition ad hoc. This example uses two techniques: ad hoc
    --   string creation (@strDBName) as well as parameterization (@intID)
    SET @strSQL = 'SELECT ID, Value FROM ['
        + @strDBName
        + '].dbo.tblTwo WHERE ID > @intID;'
    ;

    -- Display the parameters we are working with, FYI
    SELECT
        'DatabaseName' = @strDBName
        , 'SQL' = @strSQL
        , '@intID' = @intMinID
    ;

    -- Open the secondary cursor
    --   This procedure will return an empty result set with the actual
    --   column headings
    EXECUTE sp_cursoropen
        @cursor = @crs2 OUTPUT
        , @stmt = @strSQL
        , @scrollopt = 4104 -- STATIC (8), PARAMETERIZED_STMT (4096)
        , @ccopt = 8193 -- READ_ONLY (1), ALLOW_DIRECT (8192)
        , @rowcount = @intRowCount OUTPUT
        , @paramdef = N'@intID INT'
        , @intID = @intMinID
    ;

    -- Scroll through the secondary cursor records
    WHILE @@FETCH_STATUS = 0
    BEGIN

        -- Store the secondary cursor record and iterate the secondary
        -- cursor
        --   While we are simply storing the records in this
        --   example, if you are creating a cursor you should be doing
        --   something that is not possible through relational joins
        INSERT INTO @tbl (ID, Value)
        EXECUTE sp_cursorfetch
            @cursor = @crs2
            , @fetchtype = 2 -- NEXT (0x0002)
            , @nrows = 1 -- number of rows to return
        ;

    END

    -- Display the secondary cursor record set, FYI
    SELECT *
    FROM @tbl
    ;

    -- Close the secondary cursor
    EXECUTE sp_cursorclose @crs2;

    -- Reset the secondary cursor output table variable
    DELETE FROM @tbl;

    -- Iterate the primary cursor
    FETCH NEXT FROM crs INTO @strDBName, @intMinID;

END

-- Close the primary cursor
CLOSE crs;

-- Release the primary cursor

DEALLOCATE crs;

Taking a look at the result sets from the above statements will show that we did, indeed, connect to both databases and returned rows from the target tables based on the passed parameters from tblOne.

Obviously, this example could be extended to include cross-server connections through connection strings or linked servers, or other more complicated permutations.

Clean Up: Now, let’s clean up. We’ll drop the three test tables; optionally, you could just drop the test databases instead. ūüėČ

-- If our parent table exists, drop it
IF OBJECT_ID(N'DB.dbo.tblOne') IS NOT NULL
    DROP TABLE DB.dbo.tblOne
;

-- If the first secondary table exists, drop it
IF OBJECT_ID(N'DB.dbo.tblTwo') IS NOT NULL
    DROP TABLE DB.dbo.tblTwo
;

-- If the second secondary table exists, drop it
IF OBJECT_ID(N'DB2.dbo.tblTwo') IS NOT NULL
    DROP TABLE DB2.dbo.tblTwo
;

-- Change database context
USE [master];
GO

-- If the DB database exists, drop it
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'DB')
    DROP DATABASE [DB]
;

-- If the DB2 database exists, drop it
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'DB2')
    DROP DATABASE [DB2]
;

Final Notes: Now, before anyone sends me fan mail, note that this business case can be handled with dynamic TSQL commands (sp_executesql) instead of API cursor procedures. But, for 25 bonus points, who can suggest at least one reason why that might not be as good an idea?

Also, I intentionally did not refer to this approach as a dynamic cursor. While we did construct the cursor definition dynamically, the term “dynamic” is actually reserved in cursor usage to refer to a setting whereby the data underlying a cursor can be updated through the cursor. In this example, we were read-only; hence, not a “dynamic” cursor.

Surreptitiously Restrict Access

Business Case: As happens significantly more often than it should, the credentials to a privileged system account have slipped into broad use. In this instance, however,  security is not a major concern; the issue is that users have access to query databases directly through SQL Server Management Studio (SSMS) rather than through the custom application interface developed in order to control the scope of runnable queries.

Environment: The compromised credentials cannot easily be re-secured because they are used pervasively by application program interfaces (APIs) across the enterprise. An uncoordinated change of password could cause havoc, but system performance is flagging due to ad hoc, non-optimized query loading.

Approach:¬†Unsecured credentials must be addressed and coordination / planning to do so should begin immediately. However, we can also take a few quick and easy steps to mitigate the¬†likelihood¬†of a significant performance hit. In this case, SQL’s resource governor should do the trick. We will configure the resource governor to look for logins using the affected credential and SSMS as an interface program. Once SQL connections have been classified we will limit the CPU and memory that our rogue connections have access to.

Demonstration Preparation: No preparation is necessary for demonstration purposes. However, you should confirm whether your resource governor is already in use before making any modifications to it. The following scripts will provide high-level information about your resource governor configuration. If your resource governor is in use, or has a configuration pending, take extra care to ensure any changes you make are in sync with your overall environment.

-- See if the resource governor is enabled and which user-defined
--   function (UDF) is being used for classifying connections
SELECT
    is_enabled
    , 'ClassifierUDF' =
        object_schema_name(classifier_function_id) + '.'
        + object_name(classifier_function_id)
FROM sys.resource_governor_configuration
;

-- See if the resource governor is in the process of being reconfigured
--   and which UDF is set to be used for classifying connections
SELECT
    is_reconfiguration_pending
    , 'PendingClassifierUDF' =
        object_schema_name(classifier_function_id) + '.'
        + object_name(classifier_function_id)
FROM sys.dm_resource_governor_configuration
;

Solution: In order to prevent rogue logins from overloading our SQL server, we will put them into a workload group that is assigned to a resource pool with limited resources.

-- Switch connection to the master database
USE master;
GO

-- If the workload group we intend to create already exists, drop it
IF EXISTS(
    SELECT name
    FROM sys.resource_governor_workload_groups
    WHERE name = N'wrkgrpLimited')
DROP WORKLOAD GROUP wrkgrpLimited
;
GO

-- If the resource pool we intend to create already exists, drop it
IF EXISTS (
    SELECT name
    FROM sys.resource_governor_resource_pools
    WHERE name = N'poolLimited')
DROP RESOURCE POOL poolLimited
;
GO

-- Create a resource pool to restrict server resource availability
CREATE RESOURCE POOL poolLimited
WITH (
    -- Adjust the following options as you desire
    MAX_CPU_PERCENT = 25
    , MAX_MEMORY_PERCENT = 25)
;
GO

-- Create a workload group for connections to be assigned
--   to the resource pool

CREATE WORKLOAD GROUP wrkgrpLimited
WITH (
    IMPORTANCE = MEDIUM)
USING poolLimited
;
GO

-- Modify the resource governor to remove the current classifer function
ALTER RESOURCE GOVERNOR
WITH (CLASSIFIER_FUNCTION = NULL)
;
GO

-- Apply the configuration changes
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

-- If the classifier function we intend to create already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        object_id = OBJECT_ID(N'dbo.ufnResourceGovernorClassifier')
        AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION dbo.ufnResourceGovernorClassifier
;
GO

-- Create the classifier function to assign connections to a workload group
CREATE FUNCTION ufnResourceGovernorClassifier()
RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
    -- Add criteria appropriate to your situation
    IF SUSER_SNAME() = N'Compromised Login'
        AND PROGRAM_NAME
() LIKE 'Microsoft SQL Server Management Studio%'
    RETURN N'wrkgrpLimited'
    ;
    RETURN N'default';
END
;
GO

-- Modify the resource governor to assign a classifer function
ALTER RESOURCE GOVERNOR
WITH (CLASSIFIER_FUNCTION = dbo.ufnResourceGovernorClassifier)
;
GO

-- Apply the configuration changes
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

Note that if you’d like to effectively eliminate this access path you can set the resource pool to use a maximum of 1% of CPU and a maximum of 1% of memory. While users could still connect (in other words, they would not get an error message indicating their access was denied), the performance of their connection would be so poor as to highly discourage its use. Also note that we could actively deny connections based on any number of parameters, including IP address, connection method, domain, or other factors. However, that cannot be done through the resource governor as user defined functions such as our classifier function cannot effect action beyond the scope of the function (such as killing a session, writing data, etc.).