Get Precise Age

It seems a simple thing: determine the age of a person. Given his/her birth date, and the current date, the maths are quite obvious and a human does the calculation with ease. However… it is surprising that software doesn’t often ship with native functions adequate to the task. Want to see what I mean? ūüôā

Let’s start with the basic DATEDIFF function, using year¬†as the datepart argument.

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-07-31'
    , @currentdate DATE = '2016-01-01';
-- use datediff to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , DATEDIFF(YEAR, @birthdate, @currentdate) AS 'ageWithDateDiff';

Obviously this result does not satisfy the use case for which we employed the function. DATEDIFF returns the difference between the year values only, without any consideration of the month or day of either dates. The calculation indicates our person is 16 years old when, in fact, the person is 15 and has not yet reached his/her 16th birthday.

In response to this result I’ve often seen coded what I consider a brute-force approach to precision — maintaining the conceptual trajectory but applying it at a more granular level. Specifically, the DATEDIFF function is still employed but used with days instead of years, the result is divided by 365 (days per year) to get a fractional year value, and the result is rounded down to the nearest integer to get the age.

This approach works for the previous example:

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-07-31'
    , @currentdate DATE = '2016-01-01';
-- use days to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.0) AS 'ageWith365Days';

So far so good! But the approach does not work in every case, for example:

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-02-29'
    , @currentdate DATE = '2016-02-28';
-- use days to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.0) AS 'ageWith365Days';

As can be seen above, the “age calculation using days per year” method fails when a leap day is in play. But brute-force hasn’t given up just yet: let’s use 365.25 days per year to account for the leap day!

-- declare and initialize variables
DECLARE @birthdate DATE = '2000-02-29'
    , @currentdate DATE = '2016-02-28';
-- use average days per year to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.25) AS 'ageWithLeapDays';

At this point it looks like the challenge¬†is solved. However… there is still a problem; for example:

-- declare and initialize variables
DECLARE @birthdate DATE = '2001-02-28'
    , @currentdate DATE = '2016-02-28';
-- use average days per year to calculate the difference
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , FLOOR(DATEDIFF(DAY, @birthdate, @currentdate) / 365.25) AS 'ageWithLeapDays';

The “365.25” days method calculates that the person is 14. In this case, however, the current date is actually the person’s 15th birthday. In other words, the calculation fails.

Is there anything else for the brute-force approach to try? Instead… let’s attempt to code the underlying rules for calculating a person’s age (or any event anniversary).

Subtracting the years is actually a very good place to start because, if the “current” month-day is on-or-after the “birth” (event) month-day, then the raw year calculation is correct. And if the “birth” month-day is before the “current” month-day, then we subtract one from the raw year calculation.

As it turns out, this comparison of month-days is quite easy; we can simply use the¬†DATEDIFF function with the DAYOFYEAR datepart argument! Of course, we still need to accommodate leap days. If we don’t consider leap days, then we will¬†find that 59th day of the year is “normally” 01-Mar, but it’s 29-Feb on a leap year.

The good news is that accounting for leap days¬†is not as complicated as it might at first seem. We can still use DAYOFYEAR as the DATEDIFF argument, but decrement the result when dealing with a leap year on either the event or current date. (Note that we don’t need to worry about it if both event and current dates are in a leap year or neither are in a leap year — in those cases, the years have the same number of days so a straight-compare works correctly.)

CREATE FUNCTION dbo.ufnGetYearDiff (@start DATE, @end DATE)
RETURNS SMALLINT
AS
BEGIN
    DECLARE @return INT
        , @s SMALLINT = DATEPART(DAYOFYEAR, @start)
        , @e SMALLINT = DATEPART(DAYOFYEAR, @end);
    -- if necessary, decrement the start date dayofyear
    IF YEAR(@start) % 4 = 0 AND YEAR(@end) % 4 != 0 AND @s > 59 SET @s -= 1;
    -- if necessary, decrement the end date dayofyear
    IF YEAR(@start) % 4 != 0 AND YEAR(@end) % 4 = 0 AND @e > 59 SET @e -= 1;
    -- decrement the year diff if the end is earlier in the year than the start
    SET @return = DATEDIFF(YEAR, @start, @end)
        - CASE WHEN @e < @s THEN 1 ELSE 0 END;
    RETURN @return;
END
GO

Let’s test it out and see if we have solved the problem.

-- declare and initialize variables
DECLARE @birthdate DATE = '2001-02-28'
    , @currentdate DATE = '2016-02-28';
-- use the custom function to calculate the age
SELECT @birthdate AS 'birth'
    , @currentdate AS 'today'
    , dbo.ufnGetYearDiff(@birthdate, @currentdate) AS 'ageFn';

Success! Does anyone have a more elegant way to solve this challenge?

PS: Before anyone yells at me, the BOL documentation for DATEDIFF very clearly states what the function does: it returns the number of specified boundaries crossed between the startdate and enddate. Nonetheless, the function doesn’t do what the name implies (or, more precisely, what many people assume).

Range Notation

Business Case: A record set consists of sequential values that must be represented in standard range notation. For example, values include 1, 2, 3, 5, 6, 8, 9, 10, 11, 14, 15, and 16; these values should be represented as a single, comma-separated string with sequential values compressed into ranges, as follows: “1-3,5,6,8-11,14-16”.

Environment: There are no environmental factors with a relevant influence on this business case.

Approach: We’ll attempt to meet this requirement in a series of steps that build upon each other. First, we’ll identify sequence gaps in the record set. With that information we’ll determine how many values are in each range. If there are more than two sequential values, we’ll compress the range into range notation indicating the first value and the last value in the range. Finally, we’ll incorporate the single-values and the ranges and convert them into a comma-separate string using the FOR XML clause.

Solution: Let’s see the solution in action, starting with creation of our sample data set.

-- setup sample data
DECLARE @tbl TABLE (ID INT);
INSERT INTO @tbl (ID)
VALUES (1), (2), (3)
    , (5), (6)
    , (8), (9), (10), (11)
    , (14), (15), (16);

With the same data set created, we’ll define the pieces necessary to reach the desired output. For this example, we’ll use common table expressions to enhance readability of the solution. Note that it can be done without CTEs, but it would be more difficult to follow due to the nested sub-queries.

Our initial step is to identify the gaps in sequencing; we’ll do this by joining the data set to itself, with the sequence value offset by one. Whenever the offset record set is NULL, we know we have a gap in the sequence. This step is coded in the “cte” common table expression below.

Once the sequence gaps have been identified, we can join the record set of gaps to the original record¬†set of sequence numbers and add the range start value and range notation text to the range end (sequence gap) value. This is accomplished in the “cteRange” common table expression below. Note that we are grouping the records and determining those ranges that should be converted to range notation based on having a count > 2; in other words, ranges with three or more sequential values.

We’re almost there! Next we join the range data to the original record set of sequence numbers (again) and group the result based on range. In the event a sequence number is not part of a range (does not have a corresponding record in “cteRange”), it serves as its own range value. Finally, in order to convert this multi-record result into a single, comma-separated value, we use the FOR XML clause. (Haven’t done that before? Check here for more explanation, or here for another variation.) This entire process is demonstrated in the “put it all together” section of the code below.

-- create common table expressions to improve readability
WITH cte AS ( -- find range end points
    SELECT a.ID AS 'RangeEnd'
    FROM @tbl AS a LEFT OUTER JOIN @tbl AS b ON a.ID = b.ID - 1
    WHERE b.ID IS NULL)
, cteRange AS ( -- group values based on range end points
    SELECT MIN(t.ID) AS 'RangeStart'
        , oa.RangeEnd, CONVERT(VARCHAR
        , MIN(t.ID)) + '-' + CONVERT(VARCHAR, oa.RangeEnd) AS 'RangeText'
    FROM @tbl AS t
        OUTER APPLY (
            SELECT TOP 1 RangeEnd
            FROM cte
            WHERE cte.RangeEnd >= t.ID
            ORDER BY RangeEnd) AS oa
    GROUP BY oa.RangeEnd
    HAVING COUNT(*) > 2)

-- put it all together
SELECT STUFF(CONVERT(VARCHAR(MAX),
    (SELECT ',' + ISNULL(cteRange.RangeText, t.ID)
    FROM @tbl AS t
        LEFT OUTER JOIN cteRange
            ON t.ID >= cteRange.RangeStart
            AND t.ID <= cteRange.RangeEnd
    GROUP BY ISNULL(cteRange.RangeStart, t.ID), ISNULL(cteRange.RangeText, t.ID)
    ORDER BY ISNULL(cteRange.RangeStart, t.ID)
    FOR XML PATH('')))
    , 1, 1, '') AS 'RangeNotation';

Does anyone have a suggestion on how this can be accomplished more easily? Or any alternate approaches? Please share your feedback.

Anyone who is a regular reader of this blog is probably waiting for my reminder to clean-up your work area. However, since this example uses a table variable and does not materialize anything in the database… there is nothing to clean-up! So you won’t hear anything about clean-up from me. ūüėČ

The Trigger Fired?!

Data manipulation language (DML) triggers are commonly used to enforce business rules on underlying data, responding to a data change event by initiating a follow-on action, such as modifying additional data, generating a notification, or performing virtually any other action. While the basic trigger functionality is well known, there are a few interesting nuances regarding when a trigger fires. Let’s take a closer look.

To get started, let’s create a test table with a basic DML trigger and see how it works.

-- set the sql execution context
USE dev;
GO

-- suppress record count display
SET NOCOUNT ON;

-- if the test table exists, drop it
IF OBJECT_ID(N'dbo.tbl', N'U') IS NOT NULL DROP TABLE dbo.tbl;

-- create the test table
CREATE TABLE dbo.tbl (
    ID INT IDENTITY NOT NULL
    , Col1 CHAR(1) NULL);
GO

-- create a trigger on the test table
CREATE TRIGGER dbo.trgTbl ON dbo.tbl
FOR INSERT, UPDATE, DELETE
AS
BEGIN
    PRINT 'The trigger fired';
END
GO

Simple enough. With the above trigger we can now see whenever¬†a DML statement hits the table. Let’s try the CRUD (CReate, Update, Delete) statements and confirm everything is working.

-- insert to the test table
INSERT INTO dbo.tbl (Col1) VALUES ('a'), ('b');

-- update the test table
UPDATE dbo.tbl SET Col1 = 'c' WHERE Col1 = 'a';

-- delete from the test table
DELETE FROM dbo.tbl;

With each of those statements we are returned confirmation that the trigger did, in fact, fire. Nothing interesting yet.¬†But, what if we execute a non-update? And we can test the non-update in two types: a logical non-update and a physical non-update, which I’ll label types “a” and “b”, respectively.

-- update the test table with logical non-update (type a)
UPDATE dbo.tbl SET Col1 = Col1;

-- update the test table with physical non-update (type b)
UPDATE dbo.tbl SET Value = 'c' WHERE 0 = 1;

The result of the first statement execution is not particularly surprising. Though logically the SQL is not doing anything meaningful, that the trigger fires is intuitive as there are records affected. But what about the physical non-update (type b)? For the second statement, the WHERE clause ensures that there are no records in the result set. That fact might imply that the trigger should not fire because no records could be affected. However, as you will have seen from executing the statement, that is not the case. The trigger still fires even though nothing could have changed in the underlying data.

So, what’s the take away? Well, now that we’ve seen a trigger fire without underlying DML, we can take a simple step to prevent it (if appropriate based on the functional use-case). For example, the following will do the trick.

-- alter the trigger to ignore type b non-updates
ALTER TRIGGER dbo.trgTbl ON dbo.tbl
FOR INSERT, UPDATE, DELETE
AS
BEGIN
    IF @@ROWCOUNT = 0 RETURN;
    PRINT 'The trigger fired';
END
GO

By first checking if there are records to be updated, we can choose to exit the trigger immediately before incurring any additional overhead from potentially complex or unwanted SQL execution.

Finally, be sure to keep your workplace tidy!

-- clean up
IF OBJECT_ID(N'dbo.tbl', N'U') IS NOT NULL DROP TABLE dbo.tbl;

Shredding XML

There are multiple ways to shred XML within SQL Server, each with pros and cons. So which to choose?¬†Barring¬†rigid requirements about how the XML is sourced or other factors, why not choose based on performance? Surely no one would disagree with using the quickest method… so let’s explore which one is faster.

To start, let’s turn off record count display because we’ll use PRINT statements to see how long actions take.

-- turn off record count display to keep our output clean
SET NOCOUNT ON;

Next, let’s declare the variables we’ll use to both shred and measure execution times.

-- declare and initialize variables
DECLARE
    @time DATETIME2(3) -- stopwatch
    , @intXMLDocID INT -- xml document handle
    , @x XML = ( -- the XML content
            SELECT n AS id
            FROM [master].dbo.utfNumbers(100000) AS number
            FOR XML AUTO, TYPE, ROOT('count')
        )
    ;

Wait… what’s that utfNumbers in the FROM statement? You mean you don’t have one of those? Take a look here to see the value and roll your own.

Also, let’s go ahead and create a temporary table to hold the shredded XML. While we could simply return it to the client as a record¬†set, it’s not necessary for performance measurement and it introduces an extraneous factor. With such a simple XML example, we only need a simple temp table.

-- create a temp table to hold the shredded XML
IF OBJECT_ID(N'tempdb..#tbl', N'U') IS NOT NULL DROP TABLE #tbl;
CREATE TABLE #tbl(ID INT);

For our first shredding attempt, let’s start the clock, use the XML nodes method, then stop the clock and see how long it took.

-- start the stopwatch
SET @time = GETDATE();

-- shred the xml back into a table
INSERT INTO #tbl(ID)
SELECT x.a.value('@id[1]', 'int') AS 'id'
FROM @x.nodes('/count/number') AS x(a);

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to shred via nodes';

While your results may vary, my DEV machine returned an execution time of around 800 milliseconds. Go ahead and change the record count to suit your interest or curiosity; try a million records or just a hundred.

With ¬†the nodes method benchmarked, let’s try a different approach: the prepare XML document system stored procedure. For this approach we’ll actually time the action in three parts. First, how long does it take to prepare the XML document from the raw XML text; next, how long does it take to shred the XML into the temp table; and finally, how long does it take to remove the XML document (which is important, as you don’t want stray resources consuming your memory!).

-- clear the temp table
DELETE FROM #tbl;

-- start the clock
SET @time = GETDATE();

-- prepare the XML document
EXECUTE [master].dbo.sp_xml_preparedocument
    @hdoc = @intXMLDocID OUTPUT
    , @xmltext = @x
    ;

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to prepare XML doc';

-- start the clock
SET @time = GETDATE();

-- open the XML document
INSERT INTO #tbl(ID)
SELECT *
FROM OPENXML(@intXMLDocID, 'count/number')
WITH #tbl;

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to shred via OPENXML';

-- start the clock
SET @time = GETDATE();

-- remove the XML document from memory
EXECUTE [master].dbo.sp_xml_removedocument
    @hdoc = @intXMLDocID
    ;

-- stop the clock
PRINT CONVERT(VARCHAR(20), DATEDIFF(MILLISECOND, @time, GETDATE()))
    + ' milliseconds to remove XML doc';

Again, results will vary, but my DEV machine took about 350 milliseconds to prepare the XML document, 600 milliseconds to shred the XML, and zero milliseconds to remove the XML document. In other words, using the stored procedure took longer… but not really all that much. Now, that is not really unexpected, considering that parsing¬†XML text into a full tree representation just “sounds” like a bigger job than reading the nodes.

But… ready to see something interesting? Try replacing the WITH #tbl syntax with the alternate signature WITH (ID INT). Check the execution time now and you’ll see that shredding the XML takes significantly longer.

While we only took a quick look at XML shredding, I hope we were able to demonstrate that using the right tool for the job is important, and that how you use the tool can be an even more important factor.

Does anyone have a use case where the XML tree was needed? Or do you have a favorite XML shredding technique that you use? If so — please reply and share!

Decoupling a Trigger

Table-level triggers are generally used to enforce business rules based on data manipulation language (DML) statements. Among the reasons they are so good at this function is that they are implicitly wrapped into the DML transaction. Simply put: if the trigger fails the DML fails.

In many cases, that is the desired behavior. However, what if you have an optional business rule? In other words, you’d like to take secondary action on certain data events, but, if the secondary action were to fail, you still want to keep the data change. A silly example of this could be if you wanted to send¬†an email anytime someone named “Bob” registered at your website. While the email is desired, if it could not be sent for some reason, you’d still want “Bob” to be able to register. A trigger is a good candidate for this requirement¬†because we want to be directly tied to the DML event of “Bob” registering… but we somehow need to decouple the trigger action from the DML transaction.

One way to do this is to use an execution context command that allows you to specify that an error should not abort the transaction. This setting is defaulted to “off” in normal query execution… but with triggers the default is “on”. Thus, simply setting the configuration within the trigger will do the job.

One caveat, however: even with the setting “off”, if an error within the trigger is sufficiently severe, the DML transaction will fail and rollback. Nonetheless, more innocuous errors can occur without causing DML failure.

Let’s take a look at this behavior in action.

-- if the example table exists, drop it
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL DROP TABLE tbl;
GO

-- create the example table
CREATE TABLE tbl (
    ID INT IDENTITY(1,1) NOT NULL
    , Value VARCHAR(20) NOT NULL
    , CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (ID)
    );
GO

-- create a trigger on the example table
CREATE TRIGGER itrgTbl
ON tbl
FOR INSERT
AS
BEGIN
    -- turn off transaction aborting
    SET XACT_ABORT OFF;
    BEGIN TRY
        PRINT 'TRY';
        -- cause a failure
        DECLARE @i INT = 1/0;
    END TRY
    BEGIN CATCH
        PRINT 'CATCH!';
        -- capture the error values
        DECLARE
            @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE()
            , @ErrorSeverity INT = ERROR_SEVERITY()
            , @ErrorState INT = ERROR_STATE()
            ;
        -- raise the error
        IF @ErrorState > 0
            RAISERROR (
                @ErrorMessage
                , @ErrorSeverity
                , @ErrorState
                );
    END CATCH
END
GO

-- test the trigger by inserting a record
INSERT INTO tbl (Value) VALUES ('Hello, world.');

-- check the table contents to determine if the insert was successful
SELECT * FROM tbl;

When executing the code above you will see that the divide by zero error is raised, but the record was committed to the table anyway. Try commenting out the SET XACT_ABORT OFF command and running it again. Without that command the division error is escalated to a trigger execution error and the DML is blocked with message 3616:
Msg 3616, Level 16, State 1, Line 2
An error was raised during trigger execution. The batch has been aborted and the user transaction, if any, has been rolled back.

If you need to isolate your trigger code even more robustly from the DML event, you can also go fully asynchronous by using the trigger to send a message to the service broker. But more on that in a future post!

For more info on XACT_ABORT, read all about it on technet.

Random Security Trap 002

For post number two in my “Random Security Trap” series, let’s take a look at an issue that may cost you a few minutes of head-scratching if you haven’t seen it before. First, the business setup: you need to enable a group of users to have read-only access to a database, but also have controlled insert permissions to one table. A common way to solve this is to create a database user, add it to the database db_datareader role, create a stored procedure to manage the insert operation, and then grant execute permission on the stored procedure to the database user. And, in most cases, that solution works just fine.

However… what if the insert operation is a little more complicated and uses dynamic SQL to perform the insert? Well, in that case, you may fall right into the trap. Let’s demonstrate.

We’ll begin by creating a SQL login to the server and then using that login for a database user with read-only permission.

-- set context to the master database
USE [master];
GO

-- if the ReadOnlyLogin already exists, drop it
IF EXISTS (SELECT * FROM sys.server_principals WHERE name = N'ReadOnlyLogin')
    DROP LOGIN ReadOnlyLogin;

-- create a ReadOnlyLogin login
CREATE LOGIN ReadOnlyLogin
WITH
    PASSWORD = N'ReadOnly'
    , DEFAULT_DATABASE = devDB
    , CHECK_EXPIRATION = OFF
    , CHECK_POLICY = OFF
    ;

-- set context to the working database
USE devDB;
GO

-- if the ReadOnlyUser database user already exists, drop it
IF EXISTS (SELECT * FROM sys.database_principals WHERE name = N'ReadOnlyUser')
    DROP USER ReadOnlyUser;

-- create a ReadOnlyUser database user for the ReadOnlyLogin login
CREATE USER ReadOnlyUser
FOR LOGIN ReadOnlyLogin
WITH DEFAULT_SCHEMA = dbo;

-- add the database user to the datareader role
EXECUTE sp_addrolemember
    @rolename = N'db_datareader'
    , @membername = N'ReadOnlyUser'
    ;

Okay. Now that we have our login and database user created, let’s create a table so we can demonstrate controlling insert operations. Since we’ll end up simulating logging in as the ReadOnlyLogin, we’ll use the table to capture information about our testing operations.

-- if the sample table exists, drop it
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL DROP TABLE tbl;

-- create the sample table
CREATE TABLE tbl (
    ID INT IDENTITY(1,1) NOT NULL
    , DateStamp DATETIME2(3) CONSTRAINT DF_tbl_DateStamp DEFAULT(GETDATE())
    , [Login] SYSNAME
    , ContextLogin SYSNAME
    , [User] SYSNAME
    , IsDynamic BIT
    );

With our table in place, let’s create a procedure to insert data to the table. We’ll capture information about the login and execution context as a means of confirming the execution state related to our business case.

-- if the insert procedure already exists, drop it
IF OBJECT_ID(N'uspInsert', N'P') IS NOT NULL DROP PROCEDURE uspInsert;
GO

-- create the demo procedure
CREATE PROCEDURE uspInsert (
    @bolUseDynamic BIT = 0
    )
AS
BEGIN
    IF @bolUseDynamic = 1
    BEGIN
        -- insert a record via dynamic SQL
        DECLARE @sql NVARCHAR(MAX);
        SET @sql = N'INSERT INTO tbl ([Login], ContextLogin, [User], IsDynamic)
            VALUES (ORIGINAL_LOGIN(), SUSER_NAME(), USER_NAME(), 1);'
        ;
        EXECUTE (@sql);
    END
    ELSE
        -- insert a record via static SQL
        INSERT INTO tbl([Login], ContextLogin, [User], IsDynamic)
        VALUES (ORIGINAL_LOGIN(), SUSER_NAME(), USER_NAME(), 0)
        ;
END
GO

The last step in our setup is to grant execute permission to the database user.

-- grant the ReadOnlyUser permission to execute the procedure
GRANT EXECUTE ON uspInsert TO ReadOnlyUser;

Okay. Everything is in place. Let’s run a test! We’ll run the first test under our own login simply to confirm that the procedure works and captures the information we need to see.

-- test the procedure
EXECUTE uspInsert @bolUseDynamic = 0;
EXECUTE uspInsert @bolUseDynamic = 1;

-- confirm results
SELECT * FROM tbl;

So far, so good. Next, let’s run the same two procedure executions under the context of the ReadOnlyLogin. To achieve this, we could reconnect under the other login. Alternately, let’s just impersonate it instead!

-- test the procedure
EXECUTE AS LOGIN = 'ReadOnlyLogin';
EXECUTE uspInsert @bolUseDynamic = 0;
EXECUTE uspInsert @bolUseDynamic = 1;
REVERT;

-- confirm results
SELECT * FROM tbl;

As you can see, the second procedure execution failed when executed under the context of the ReadOnlyLogin:

Msg 229, Level 14, State 5, Line 1
The INSERT permission was denied on the object 'tbl', database 'devDB', schema 'dbo'.

Why did it fail? Because even though the dynamic SQL is executed within a stored procedure on which the ReadOnlyLogin is granted permission, dynamic SQL executes within a separate session. The login impersonation is passed through, but the sql is no longer strictly within the context of the procedure. And outside of the procedure, the ReadOnlyLogin does not have permission to insert into the “tbl” table.

As ever, let’s be sure to clean up after ourselves.

-- drop objects
IF OBJECT_ID(N'uspInsert', N'P') IS NOT NULL DROP PROCEDURE uspInsert;
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL DROP TABLE tbl;
IF EXISTS (SELECT * FROM sys.database_principals WHERE name = N'ReadOnlyUser')
    DROP USER ReadOnlyUser;
IF EXISTS (SELECT * FROM sys.server_principals WHERE name = N'ReadOnlyLogin')
    DROP LOGIN ReadOnlyLogin;

For additional information on pitfalls of dynamic SQL related to execution context, give this post a read.

Query Optimizer FAIL! 001

The query optimizer is tasked with converting a SQL statement into a list of commands that return a logical and consistent result. The optimizer part comes in as the database server attempts to determine the most efficient execution plan, looking for a balance between identifying the best plan and making a decision quickly. Much has been written elsewhere about the query optimizer, so I won’t review that information here. While the query optimizer is very impressive in its function, it has become a popular pursuit for database developers to find ways to trip-up the optimizer and have it chose an inefficient plan, or actually give up and not be able to find any execution plan at all.

For this post, let’s walk through an example of the latter, more extreme case — when the optimizer completely gives up and returns error¬†Msg¬†8624:¬†Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

As usual, we’ll start by setting context to a development database. In this case, I’m using a database called “devDB”, running on SQL Server 2008 R2, Service Pack 2.

-- specify the database context
USE devDB;
GO

Next we’ll create a reflexive linked server. Using a linked server is a contributing factor in tripping up the optimizer, but we need not actually be talking to a second server. Thus we’ll create a linked server that refers back to itself, allowing us to demonstrate the issue on a single server. A reflexive linked server is part of my “default” server build script as it comes in handy now and then. For other uses of a reflexive linked server, see my previous post on selecting from stored procedures.

-- create a reflexive linked server;
IF NOT EXISTS(SELECT * FROM [master]..sysservers WHERE srvname = N'mirror')
EXECUTE sp_addlinkedserver
    @server = N'mirror'
    , @srvproduct = N''
    , @provider = N'SQLOLEDB'
    , @datasrc = @@SERVERNAME --current server is data source for linked server
    ;
GO

Now let’s create a few tables to use in a query. Nothing fancy required here. We’ll create table “A” that has an ID field and a foreign key to reference a second table, table “B”. The “B” table will simply be a list of a few numbers in integer and string form. While we intend to have a relationship between the tables, there is no need to define the foreign key. You can if you want, but it will not affect the result. I have left it out because it is not necessary to reproduce the optimizer fail.

-- if table "A" already exists, drop it
IF OBJECT_ID(N'tblA', N'U') IS NOT NULL DROP TABLE tblA;
GO

-- create table "A"
CREATE TABLE tblA (
    ID INT
    , FK INT
    );

-- populate table "A"
INSERT INTO tblA (ID, FK)
VALUES
    (1, 1)
    ,(2, 1)
    ,(3, 2)
    ,(4, 3)
    ,(5, 4)
    ;

-- if table "B" already exists, drop it
IF OBJECT_ID(N'tblB', N'U') IS NOT NULL DROP TABLE tblB;
GO

-- create table "B"
CREATE TABLE tblB (
    ID INT
    , String VARCHAR(10)
    );

-- populate table "B"
INSERT INTO tblB (ID, String)
VALUES
    (1, 'One')
    ,(2, 'Two')
    ,(3, 'Three')
    ,(4, 'Four')
    ;
GO

With our base tables in place, let’s now define and populate a temporary table with a data set based on the joined base tables.

-- if the temp table already exists, drop it
IF OBJECT_ID(N'tempdb..#t', N'U') IS NOT NULL DROP TABLE #t;
GO

-- create a temp table to use in an update query
CREATE TABLE #t(
    RecordType INT
    , AID INT
    , BString VARCHAR(10)
    );

-- initialize the temp table
INSERT INTO #t(RecordType, AID)
VALUES
    (1, 1)
    ,(1, 2)
    ,(2, NULL)
    ;
GO

And now, just to double-check, let’s confirm the content of the temp table.

-- confirm content of temp table
SELECT * FROM #t;
GO

So far, so good! Next, let’s update the temp table. This will cause a couple things to happen in the background. We’ll talk about them later, but, for now, let’s just do the update.

-- update the temp table
UPDATE t
SET BString = (
        SELECT TOP 1 String
        FROM mirror.devDB.dbo.tblB AS b
        WHERE b.ID = a.FK
        )
FROM
    #t AS t
    INNER JOIN mirror.devDB.dbo.tblA AS a
        ON t.AID = a.ID
WHERE t.RecordType = 1;
GO

Now, before anyone gets all excited, I am well aware there are more efficient ways to write the update statement. You forget the point! We’re not trying to write an efficient query… we are trying to write a query with which the optimizer doesn’t know what to do. ūüôā

Once more, let’s confirm the content of the table to see that the update was successful.

-- confirm content of temp table
SELECT * FROM #t;
GO

Now comes the fun part. Let’s run the exact same query again and see what happens.

-- update the temp table
UPDATE t
SET BString = (
        SELECT TOP 1 String
        FROM mirror.devDB.dbo.tblB AS b
        WHERE b.ID = a.FK
        )
FROM
    #t AS t
    INNER JOIN mirror.devDB.dbo.tblA AS a
        ON t.AID = a.ID
WHERE t.RecordType = 1;
GO

Pow! You’ll see that SQL Server was unable to build a query plan and thus could not execute the statement.

Msg 8624, Level 16, State 1, Line 1
Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

So, what the heck happened? To be honest, I don’t know. Whatever is going on inside the query optimizer, it couldn’t deal with the combination of factors. But why did the first update succeed and the second one fail? Statistics. Specifically, when the table was created and populated, SQL Server did not have any useful information about what data was inside the table. Thus, when it attempted to run the update, it just put the pieces together and did it. However, in so doing, it scanned the data in the tables and created statistics on the values in each column in both tables. When the update statement runs the second time, the statistics are available to the optimizer, but for whatever reason the statistics actually confound the optimizer and it is not able to come to a solution.

Having said that, let me be clear: the statistics are not the problem. At least they are not the only problem. To see what I mean, try a few of the following; any one of which is enough of a difference to allow the optimizer to figure out what is going on, create a query plan, and execute the update.

  1. Delete the statistics from tblA. With no statistics, the update succeeds. The statistics on tblB do not matter in this case.
  2. Remove the WHERE clause from the update statement. The clause is actually irrelevant because the INNER JOIN does effectively the same thing: limits the update to records with a non-null value in the “AID” column and a matching value in tblB.
  3. Do not use either or both of the linked server paths (replace them with a “normal”, local path instead). While the same table is referenced regardless of using the linked server or not, going through the linked server connection affects what visibility the query optimizer has to the target data.
  4. Move the sub-query from the SET statement into the FROM clause. That would be a more efficient query anyway, but, again, that wasn’t the point. That little bit of “poor query writing”, in combination with the other factors mentioned here, is sufficient to stump the optimizer.

Changing any one of the four factors above will result in the update query succeeding.

If you know what is going on inside the optimizer that causes this to fail, or if you know of a SQL Server update that resolves this “bug”, please post a comment and let us know.

For Each DB… Almost

The potential problems with the undocumented system procedure¬†sp_MSforeachdb are well known to seasoned TSQL developers. Yes, it doesn’t pickup all databases (only those that are accessible at the time of execution). Yes, under certain versions of SQL Server (such as 2005 RTM) it picks up snapshots as well. Yes, it is a cursor behind the scenes and thus doesn’t offer any performance advantage to rolling your own.

And here is one more problem: it doesn’t trap @@FETCH_STATUS = -2.

What is the significance of that? Well, it is one more way for the procedure to let you down in a way you didn’t expect. Let’s take a closer look.

The following code will use the system procedure to list every* database on the server (*within the constraints of accessibility mentioned above! ;).

-- declare and set a sql string to print the database name
DECLARE @sql NVARCHAR(2000) = N'PRINT ''?'';';

-- use sp_MSforeachdb to print the name of "each" database
EXECUTE sp_MSforeachdb @command1 = @sql;

On my DEV server, the above query generates the following result:

master
tempdb
model
msdb
ReportServer$DEV
ReportServer$DEVTempDB
devDB
devDW

Six “system” databases plus a dev database and a dev data warehouse. So far so good. But… what if one of those databases goes offline while the procedure is running? Let’s find out!

For this example I will take the second to last database (devDB) offline during procedure execution. If you are testing at home, choose any database except a system database or the last database in your list. The system database restriction should be obvious; excluding the last database will become apparent after the demo.

-- declare and set a sql string to take the "devDB" database offline
-- and print the context database name from the MSforeachdb procedure
DECLARE @sql NVARCHAR(2000) = N'IF ''?'' = ''master''
    ALTER DATABASE devDB SET OFFLINE WITH ROLLBACK IMMEDIATE;
PRINT ''?'';';

-- use sp_MSforeachdb to execute the sql against "each" database
EXECUTE sp_MSforeachdb @command1 = @sql;

-- put "devDB" back online
ALTER DATABASE devDB SET ONLINE;

On my DEV system, when I execute the previous query the following results are returned:

master
tempdb
model
msdb
ReportServer$DEV
ReportServer$DEVTempDB

Notice anything? Not only is devDB missing from the list… but so is devDW! And yet, no execution errors reported, so I’d have no idea anything went wrong.

But, what actually did go wrong? As I mentioned, the problem is that sp_MSforeachdb does not trap @@FETCH_STATUS = -2. To demonstrate, let’s first recreate what the procedure is doing.

-- declare the cursor output variable
DECLARE @sysDBName SYSNAME;

-- declare and initialize the cursor
DECLARE crs CURSOR FOR SELECT name FROM sys.databases;

-- open the cursor
OPEN crs;

-- initialize the cursor output
FETCH NEXT FROM crs INTO @sysDBName;

-- iterate through the cursor records
WHILE @@FETCH_STATUS = 0
BEGIN
    -- take the "devDB" offline
    IF @sysDBName = 'master'
        ALTER DATABASE devDB SET OFFLINE WITH ROLLBACK IMMEDIATE;
    -- print the cursor context database
    PRINT @sysDBName;
    -- fetch the next value from the cursor
    FETCH NEXT FROM crs INTO @sysDBName;
END

-- close cursor
CLOSE crs;

-- release cursor from memory
DEALLOCATE crs;

-- put "devDB" back online
ALTER DATABASE devDB SET ONLINE;

The result of the above query is the same as the procedure; devDW is missing and no error is returned. But now we can easily add some additional syntax to identify and report on the issue. For example, add the following immediately after the FETCH command within the WHILE loop:

-- trap for missing records
WHILE @@FETCH_STATUS = -2 BEGIN
    PRINT 'The row fetched is missing. Continuing to the next row....';
    FETCH NEXT FROM crs INTO @sysDBName;
END

Now we can see that the devDB row could not be fetched because it is missing from the cursor record set. But we can also chose to continue on to the next record, or take other action as needed. Another approach could be to loop through cursor results on @@FETCH_STATUS <> -1 instead of = 0. This example also shows why we didn’t get an error record — because, despite the unexpected results, no error actually occurred! @@FETCH_STATUS -2 is not an error, but a possibility that SQL Server anticipates and has already enumerated. See here for more info.

While we don’t need any more reasons to abandon sp_MSforeachdb, I suppose it is worth remembering that the procedure is undocumented for a reason.

Break!

What is the difference between a breaking space and a non-breaking space? About five minutes of frustration.

Let’s take a look at the potential confusion. First, let’s create a temporary table and insert two records, one with a breaking space, and another with a non-breaking space. We’ll use the ASCII codes to ensure we get it right (and because it will allow you to copy from this post and try it yourself).

-- declare a temporary table
DECLARE @tbl TABLE (
    Value VARCHAR(50)
    );

-- insert two records to the table
INSERT INTO @tbl
VALUES
    ('Hello' + CHAR(32) + 'World')
    ,('Hello' + CHAR(160) + 'World')
    ;

-- select the results
SELECT DISTINCT Value FROM @tbl;

Notice anything funny about the result set? Yup‚ÄĒin the normal grid view, the records look identical, and yet they are not collapsed by the DISTINCT statement. Unless you have non-breaking spaces on the mind, that may be an unexpected result.

The good news is there’s an easy way to check for non-breaking spaces. Simply copy the result grid and paste it to the query editor. You’ll notice that the non-breaking space is visually identified as a solid red line. Thus it’s a rather easy difference to check for.

And one of the best things about this example? No clean-up needed! ūüôā

Temp Table Scope, Part 2

In a previous post we discussed the scope of temporary tables; specifically, their persistence inside and outside of dynamic SQL statements within the same connection. In that post we examined using a global temporary table to resolve the persistence issue. In this post, we’ll look at an alternate approach: creating a basic table first and then extending it through dynamic SQL. Let’s get started!

As you’ll recall, we previously demonstrated that temporary tables created within dynamic SQL were not available outside of the dynamic SQL execution. However, a temporary table created within a connection is available within a dynamic SQL context. The following script confirms this:

-- If the temp table we intend to create already exists, drop it
IF OBJECT_ID(N'tempdb..#tbl') IS NOT NULL
    DROP TABLE #tbl;

-- Create a basic temp table
CREATE TABLE #tbl(ID INT IDENTITY(1,1) NOT NULL);

-- Confirm the temp table was created
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

-- Confirm the temp table is accessible within dynamic SQL
DECLARE @SQL NVARCHAR(2000) =
    N'SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..#tbl'');';
EXECUTE (@SQL);

You will notice that the object id of the temporary table is the same both outside and inside the dynamic SQL. In other words, the temporary table is persistent across the connection, unlike when a temporary table is created within dynamic SQL.

Now that we’ve seen the temporary table persists into dynamic SQL, we can extend the table within the dynamic SQL statement.

/* Extend the temp table via dynamic SQL
In this instance we'll add a parameterized number of columns */
DECLARE
    @i TINYINT = 0
    , @j TINYINT = 3;

WHILE @i < @j
BEGIN
    SET @i += 1;
    SET @SQL = N'ALTER TABLE #tbl ADD Col'
        + CONVERT(NVARCHAR(10), @i)
        + N' VARCHAR(50) NULL;';
    EXECUTE (@SQL);
END

-- Confirm that the table was extended outside the dynamic SQL
SELECT * FROM #tbl;

As the query result shows, the columns were successfully added to the temporary table. And, what’s more, the temporary table is available both inside and outside of a dynamic SQL context.¬†Bob’s your uncle.