Prevent TRUNCATE TABLE

The ability to truncate a table is a powerful tool. Truncation removes all data from a table, and, because the operation is minimally logged, it happens almost instantly. However, being minimally logged means that individual record deletions are not recorded in the transaction log. There are other potential pitfalls from truncation as well. Can someone suggest a few?

To avoid some of these pitfalls, we can implement a safeguard to prevent table truncation. As BOL states, truncating is not allowed on tables that:

  • Are referenced by a FOREIGN KEY constraint. (You can truncate a table that has a foreign key that references itself.)
  • Participate in an indexed view.
  • Are published by using transactional replication or merge replication.

In addition to these three conditions, it is also true that when a table is enabled for Change Data Capture (CDC) it cannot be truncated. Why? Because CDC reads the transaction log to monitor data changes, and, as mentioned above, truncation does not write record deletes to the log. An attempt to truncate a table enabled for CDC will result in message 4711:

Msg 4711, Level 16, State 1, Line 1
Cannot truncate table 'tbl' because it is published for replication or enabled for Change Data Capture.

So, which of these four options should we use? Enabling CDC seems like a heavy solution simply to prevent truncation, plus it is only available at Enterprise Edition. Enabling replication also feels like an out-sized answer; plus, tables can be truncated within replication operations, so that does not provide reliable protection for the data.

That leaves two options: create a foreign key constraint, or include the table in an indexed view. If the table has a logical relationship with another table, then enabling a foreign key between them is probably the best answer as there is virtually no cost involved. However, if no logical relationship exists, we could create an empty table simply for the purpose of creating a foreign key.

Let’s demonstrate that approach first.

-- Set the database context
USE [db];

-- If the table already exists, drop it
IF OBJECT_ID(N'dbo.tbl', N'U') IS NOT NULL DROP TABLE dbo.tbl;

-- Create a table
CREATE TABLE dbo.tbl (
     ID INT IDENTITY(1,1) NOT NULL
     , Value VARCHAR(20) NOT NULL
     , CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (ID)
     )
;
GO

-- If the dummy foreign key table already exists, drop it
IF OBJECT_ID(N'dbo.tblFK', N'U') IS NOT NULL DROP TABLE dbo.tblFK;

-- Create the dummy foreign key table
CREATE TABLE dbo.tblFK (
     ID INT IDENTITY(1,1) NOT NULL
     , IDFK INT NOT NULL
     , CONSTRAINT FK_tbl_tblFK FOREIGN KEY (IDFK) REFERENCES dbo.tbl(ID)
     )
;
GO

-- Attempt truncation
TRUNCATE TABLE tbl;

Truncation will fail with error 4712:

Msg 4712, Level 16, State 1, Line 2
Cannot truncate table 'tbl' because it is being referenced by a FOREIGN KEY constraint.

We could also create an indexed view of the table. This approach is as simple as the previous, and it has the same drawback of creating a “dummy” object in the database. However, it also has the disadvantage of consuming more disk space. Why? Because the index has to be stored! But that may not be a bad thing… if you have a secondary use for the view.

Let’s demonstrate this approach as well, re-using the sample table we created previously.

-- If the dummy indexed view already exists, drop it
IF OBJECT_ID(N'dbo.vwTbl', N'V') IS NOT NULL DROP VIEW dbo.vwTbl;
GO

-- Create a view with schemabinding (necessary for indexing)
CREATE VIEW dbo.vwTbl
WITH SCHEMABINDING
AS
     SELECT ID, Value
     FROM dbo.tbl;
GO

-- Index the view
CREATE UNIQUE CLUSTERED INDEX UIX_vwTbl
ON dbo.vwTbl(ID);
GO

-- Drop the dummy foreign key table
IF OBJECT_ID(N'dbo.tblFK', N'U') IS NOT NULL DROP TABLE dbo.tblFK;

-- Truncation will fail because the table is referenced by an indexed view
TRUNCATE TABLE tbl;

Truncation will fail with error 3729:

Msg 3729, Level 16, State 2, Line 2
Cannot TRUNCATE TABLE 'tbl' because it is being referenced by object 'vwTbl'.

So, we just walked through a few options to prevent table truncation. What about using a DDL trigger on the database? Well, for whatever reason, there is no TRUNCATE TABLE event defined in SQL Server. And, while a truncation is effectively a drop and recreate of the table, it does not fire the DROP TABLE event. Thus, a DDL trigger will not work.

Does anyone have any other ideas or real-world experience with preventing TRUNCATE TABLE operations? Please send them in via a comment.

As ever, don’t forget to clean-up after yourself!

-- Drop the demonstration objects 
IF OBJECT_ID(N'dbo.vwTbl', N'V') IS NOT NULL DROP VIEW dbo.vwTbl;
IF OBJECT_ID(N'dbo.tbl', N'U') IS NOT NULL DROP TABLE dbo.tbl;

Advertisements

Test Compatibility Level

Business Case: An existing application is built on a legacy version of the SQL Server database engine (such as SQL Server 2000). To take advantage of modern capabilities, the application is to be upgraded to a more recent version of the database engine.

Environment: The application is business-critical and unscheduled downtime is not acceptable. The application must be tested to confirm reliability at the higher database version, but full-function application testing is unrealistic due to the complexity of the application.

Approach: To test the application database layer, we will create a script to collect and parse the existing SQL statements under the higher database version, capturing any errors that might be generated.

Solution: To demonstrate the approach, we will simply write and test a few SQL statements. To fully-develop this solution, rather than run the test against our sample SQL table, we could run it against a filtered selection of sys.sql_modules — a system view that contains the SQL definition statement of virtually all objects in the database.

First, let’s set the database context to a database at the target version (compatibility level). We’ll also suppress statement record count reports.

-- Set the database context to a db at the target version
USE [db];
GO

-- Do not show statement record counts
SET NOCOUNT ON;

For demonstration purposes, we’ll create a table variable to store the SQL statements to be tested. In the real-world, a query from the [definition] field of sys.sql_modules would be a good place to start. In this example we’ll test three statements that are valid in SQL Server 2000 but invalid in SQL Server 2008.

-- Create the SQL input table
DECLARE @tblIn TABLE (
    ID INT IDENTITY(1,1) NOT NULL
    , SQLString NVARCHAR(MAX) NOT NULL
    )
;

-- Populate the SQL input table
INSERT INTO @tblIn (SQLString)
VALUES -- these statements are valid at level 80, invalid at level 100
    (N'SET')
    , (N'SELECT o.name FROM sys.objects AS o HOLDLOCK;')
    , (N'SELECT o.name FROM sys.objects AS o, sys.tables AS t WHERE o.[object_id] *= t.[object_id];')
;

Now we’ll create an output table to store the results of our syntax check. We’ll capture the details of our input as well as the level at which we are testing the statement and any errors that might result.

-- Create the SQL output table
DECLARE @tblOut TABLE (
    ID INT NOT NULL
    , SQLString NVARCHAR(MAX) NOT NULL
    , ValidationLevel TINYINT NOT NULL
    , ErrorNumber INT NOT NULL DEFAULT(-1)
    , ErrorMessage NVARCHAR(2000) NOT NULL DEFAULT(N'')
    )
;
GO

Next we’ll begin the automated testing phase. We’ll create a cursor to scroll through the SQL statements to be tested. Start by creating variables to hold relevant information for the cursor, then initialize the cursor definition and iteration process.

-- Declare the cursor output variable(s)
DECLARE
    @intID INT
    , @nstrSQLOrig NVARCHAR(MAX)
    , @nstrSQL NVARCHAR(MAX)
    , @intCompatLevel TINYINT = (
            SELECT [compatibility_level]
            FROM sys.databases
            WHERE database_id = DB_ID()
        )
;

-- Declare and initialize the cursor
DECLARE crs CURSOR FAST_FORWARD
FOR
    -- Define the cursor recordset
    SELECT ID, SQLString
    FROM @tblIn
;

-- Open the cursor
OPEN crs;

-- Initialize the cursor output
FETCH NEXT FROM crs INTO @intID, @nstrSQL;

With the cursor now open, let’s iterate through the records. For each SQL statement to be tested, we’ll first capture the original SQL statement. Then we’ll modify the statement by prepending turning the PARSEONLY setting on. That setting tells SQL Server to check the syntax of the statement, but do not compile it and do not execute it. If, for your situation, you’d like to test statement compilation as well, try setting NOEXEC on. Once parsed, we’ll turn PARSEONLY off so that further statements are executed. And, by wrapping the whole thing in a try-catch block and within a transaction, we can control the flow and error capture to meet our needs.

-- Iterate through the cursor records
WHILE @@FETCH_STATUS = 0
BEGIN
    -- Capture the original SQL statement
    SET @nstrSQLOrig = @nstrSQL
    -- Attempt to parse
    BEGIN TRY
        -- Start transaction
        BEGIN TRANSACTION
            -- Prepend statement with parseonly setting 
            SET @nstrSQL = N'SET PARSEONLY ON; ' + @nstrSQL;
            EXECUTE (@nstrSQL);
            -- Turn off parseonly
            SET PARSEONLY OFF;
            -- Capture statements and validation level
            INSERT INTO @tblOut (ID, SQLString, ValidationLevel)
            VALUES (
                @intID
                , @nstrSQLOrig
                , @intCompatLevel
            ;
        -- Commit transaction
        COMMIT TRANSACTION;
    END TRY
    -- Parse failed; capture error
    BEGIN CATCH
        -- Roll back the transaction
        ROLLBACK TRANSACTION;
        -- Capture statement and error information
        INSERT INTO @tblOut (
            ID
            , SQLString
            , ValidationLevel
            , ErrorNumber
            , ErrorMessage
            )
        VALUES (
            @intID
            , @nstrSQLOrig
            , @intCompatLevel
            , ERROR_NUMBER()
            , ERROR_MESSAGE()
            )
        ;
    END CATCH
    -- Iterate the cursor output
    FETCH NEXT FROM crs INTO @intID, @nstrSQL;
END

Finally, let’s close and deallocate the curser and turn statement counts back on.

-- Close cursor
CLOSE crs;

-- Release cursor from memory
DEALLOCATE crs;

-- Show statement counts
SET NOCOUNT OFF;

What about the results? Well, let’s take a look!

-- View validation report
SELECT * FROM @tblOut;

If the preceding script is run under SQL Server 2000 (compatibility level 80), you’ll see the following output.

Test SQL at level 80

Valid syntax for SQL 2000

And, if run under SQL Server 2008 (compatibility level 100), you’ll see the following.

Test SQL at level 100

Invalid syntax for SQL 2008

Does anyone have any suggestions on a more elegant way to accomplish this type of automated compatibility testing?

Descriptive Statistics – Median

Yesterday we lamented the absence of two of the basic descriptive statistics aggregate functions: mode and median. We then demonstrated how to build your own mode function. Today we’ll show how to create a median function.

As with our mode function, we’ll pass our record set of values to the function as XML with a node named “value”. The function will return a decimal value; however, the return is easily modified to meet your needs.

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnMedian')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnMedian;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnMedian (
    @xml XML -- XML containing a single node named "value"
    )
RETURNS DECIMAL(16,4) -- decimal return; redefine as needed
AS
BEGIN
    -- Declare variables
    DECLARE
        @decReturn DECIMAL(16,4)
        , @n INT = (SELECT @xml.value('count(/row/value)','int'))
        , @x INT
    ;
    -- Check to see if there is an even or odd number of records
    IF @n % 2 = 0 -- Even; average the two middle values
    BEGIN
        -- Find the index of the lower middle value
        SET @x = @n / 2.0;
        -- Select the average of the lower and next highest values
        SELECT @decReturn = AVG(value)
        FROM (
                SELECT
                    'value' = tbl.col.value('value[1]','float')
                    , 'n' = ROW_NUMBER()
                        OVER(ORDER BY tbl.col.value('value[1]','float'))
                FROM @xml.nodes('row') tbl(col)
            ) AS dTbl
        WHERE n = @x OR n = @x + 1;
    END
    ELSE -- Odd; return the middle value
    BEGIN
        -- Find the index of the middle value
        SET @x = CEILING(@n / 2.0 );
        -- Select the middle value
        SELECT @decReturn = value
        FROM (
                SELECT
                    'value' = tbl.col.value('value[1]','float')
                    , 'n' = ROW_NUMBER()
                        OVER(ORDER BY tbl.col.value('value[1]','float'))
                FROM @xml.nodes('row') tbl(col)
            ) AS dTbl
        WHERE n = @x;
    END
    -- Return the result of the function
    RETURN @decReturn;
END
GO

To call our median function, simply convert your values to XML and then pass it to the function. For example:

-- Pass the XML to the function
SELECT [master].dbo.ufnMedian(dTbl.x)
FROM (
        SELECT value
        FROM tbl
        FOR XML PATH
    ) AS dTbl(x)
;

Note that the function assumes that the XML node is named “value”; thus you should alias your input values as such when you convert to XML.

Can anyone suggest a more elegant way to calculate median in TSQL? Please send in your comments!

Descriptive Statistics – Mode

For those who have attempted rudimentary analysis through TSQL, I am confident that you found a conspicuous absence in the built-in aggregate functions. Specifically, where are the mode and median functions? The mean, summation, and count functions exist, and even standard deviation and variance are there, in both sample and population flavors. But where are median and mode?!

It would be great to be able to define the two missing statistical aggregates as TSQL aggregate functions; unfortunately, that is not permitted. Only Common Language Runtime (CLR) assemblies are allowed to be a function of the aggregate type. However, you can create a scalar value TSQL function that will perform the calculation with roughly the same usage.

Carrying on the XML theme from recent posts, our mode function will accept an array of values as XML.  We’ll then shred it and aggregate it, returning a string result. Why a string? Because mode can return multiple values when there is a tie. After all, in a uniform distribution, each value is equally probable!

Behold, the mode function in TSQL:

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnMode')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnMode;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnMode (
    @xml XML -- XML containing a single node named "value"
    )
RETURNS VARCHAR(200) -- string return because mode could be multiple values
AS
BEGIN
    -- Declare variables
    DECLARE @strReturn VARCHAR(200);
    -- Calculate the mode
    SELECT @strReturn = (
            -- Concatenate the value(s) to a string
            SELECT CONVERT(NVARCHAR(20), dTbl2.value) + '; ' AS [text()]
            FROM (
                    -- Find the most common value(s)
                    SELECT TOP 1 WITH TIES dTbl.value
                    FROM (
                            -- Shred the XML to a table
                            SELECT 'value' =
                                tbl.col.value('value[1]','float')
                            FROM @xml.nodes('row') tbl(col)
                        ) AS dTbl
                    GROUP BY dTbl.value
                    ORDER BY COUNT(*) DESC
                ) AS dTbl2
            ORDER BY dTbl2.value
            FOR XML PATH('')
        )
    ;
    -- Remove the trailing value delimiter
    SET @strReturn = LEFT(@strReturn, LEN(@strReturn) - 1);
    -- Return the result of the function
    RETURN @strReturn;
END
GO

To call the function, simply convert your values to XML and then pass it to the function. For example:

-- Pass the XML to the function
SELECT [master].dbo.ufnMode(dTbl.x)
FROM (
        SELECT value
        FROM tbl
        FOR XML PATH
    ) AS dTbl(x)
;

Note that the function assumes that the XML node is named “value”; thus you should alias your input values as such when you convert to XML. Alternately, you could extend the function to accept any name for the node.

Can anyone suggest a more elegant way to calculate mode in TSQL? Please send in your comments!

And what about the median function? Stay tuned for the next post! 😉

Convert Rows to Column (Part II)

In a previous post we demonstrated how to use the FOR XML clause to concatenate multiple rows of one data field into a single row of one data field. While that technique is quite handy, what about when there are multiple fields to be concatenated? In other words, what if the native data has a one-to-many relationship but we need to return a one-to-one record set without losing any of the fields or data?

One possible solution is to simply extend the single-field example and create an additional CROSS APPLY block for each additional field. While not an elegant solution, it does work. Before we explore a more modular solution, let’s refresh our memories on what this first solution would look like.

Before we dig in, let’s create some objects for demonstration purposes.

-- Set the database context
USE [db];
GO

-- If the demo table already exists, drop it
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL
    DROP TABLE tbl;
GO

-- Create the demo table
CREATE TABLE tbl (
    ID INT IDENTITY(1,1) NOT NULL
    , FK INT NOT NULL
    , Value VARCHAR(20) NOT NULL
    , CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (ID)
    )
;
GO

-- Populate the demo table
INSERT INTO tbl (FK, Value)
VALUES
    (1, 'One')
    , (2, 'Two')
    , (2, 'Three')
;
GO

We’ll also make use of our numbers function to simulate the “one” side of our “one-to-many” data set. The number data will join to the foreign key (FK) we created in the demo table, which serves as our “many” side. Wait — you don’t have a “numbers” function? Get one here.

Okay, so with our environment ready, let’s demonstrate concatenation of multiple rows across multiple fields using FOR XML and CROSS APPLY.

-- First approach; effective but neither scalable nor elegant
SELECT
    nums.n
    , dTbl.ID
    , dTbl2.Value
FROM
    [master].dbo.utfNumbers(3) AS nums
    -- Join to concatenate the first field (ID)
    CROSS APPLY (
        SELECT CONVERT(VARCHAR(20), ID) + '; ' AS [text()]
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH('')
    ) AS dTbl(ID)
    -- Join to concatenate the second field (Value)
    CROSS APPLY (
        SELECT Value + '; ' AS [text()]
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH('')
    ) AS dTbl2(Value)
;

We are able to return the results we expect; however, there are two major issues. One, every additional field to return requires us to add an additional CROSS APPLY block; and two, every additional CROSS APPLY block causes an additional query/join to the data. If we could use just one CROSS APPLY block for all the columns, we could save both ourselves and the database engine some work.

And, of course, we can do just that. Again we will use the capabilities of XML to do the job. This time, rather than returning the concatenated result of a single field in the CROSS APPLY, we will return the entire record set (all rows and all columns) to the outer query in the form of a single XML value. With the XML value at the top query, we can then parse out the individual fields (nodes) in the outer SELECT statement.

Yes, that was a lot of words. But, I trust that they will make more sense once you see the example.

-- Second approach; a function makes it scalable
SELECT
    nums.n
    , 'IDs' = [master].dbo.ufnXMLConcat(dTbl.x, 'ID')
    , 'Values' = [master].dbo.ufnXMLConcat(dTbl.x, 'Value')
FROM
    [master].dbo.utfNumbers(3) AS nums
    -- Join to all of the child data, returned as XML
    CROSS APPLY (
        SELECT ID, Value
        FROM tbl
        WHERE tbl.FK = nums.n
        FOR XML PATH
    ) AS dTbl(x)
;

Obviously, this syntax is preferable. It is not only easier to author but it also requires fewer data touches for the SQL engine to resolve.

But wait… what is that ufnXMLConcat function? Why it is right here, of course! 😉

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnXMLConcat')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnXMLConcat;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnXMLConcat (
    @xml XML
    , @strNode VARCHAR(50)
    )
RETURNS VARCHAR(MAX)
AS
BEGIN
    -- Declare variables
    DECLARE
        @strReturn VARCHAR(MAX) = ''
        , @strXML VARCHAR(MAX) = CONVERT(VARCHAR(MAX), @xml)
    ;
    -- Return default if XML is null
    IF @strXML IS NULL RETURN @strReturn;
    -- Parse the XML string for the node
    WHILE CHARINDEX('<' + @strNode + '>', @strXML) != 0
    BEGIN
        -- Extract and concatenate the node value
        SET @strReturn += SUBSTRING(
                @strXML
                , CHARINDEX('<' + @strNode + '>', @strXML)
                    + LEN('<' + @strNode + '>')
                , CHARINDEX('</' + @strNode + '>', @strXML)
                    - CHARINDEX('<' + @strNode + '>', @strXML)
                    - LEN('<' + @strNode + '>')
                )
            + '; '
        ;
        -- Remove the node from the XML string
        SET @strXML = SUBSTRING(
                @strXML
                , CHARINDEX('</' + @strNode + '>', @strXML)
                    + LEN('</' + @strNode + '>')
                , LEN(@strXML)
                    - CHARINDEX('</' + @strNode + '>', @strXML)
                    - LEN('</' + @strNode + '>')
            )
        ;
    END
    -- Remove the trailing value delimiter
    SET @strReturn = LEFT(@strReturn, LEN(@strReturn) - 1);
    -- Return the result of the function
    RETURN @strReturn;
END
GO

As you can see, by accepting an XML value and a string node name as input, simple string functions take care of the rest. There are, of course, even more elegant ways to parse XML; but more complex programming is not necessarily permitted within SQL functions.

Although the second example provides a more elegant solution, it should be noted that even this solution does not scale perfectly. For example, if the outer query returns millions of rows and/or the inner query contains many rows or many columns, there could definitely be performance problems. After all, executing functions within a SELECT statement is asking for trouble when the record count gets high.

Does anyone have a more elegant way of handling this business case? If so, please share via a comment.

Special thanks to my colleague, Bryan Clayton, for his assistance in defining the business case and working through alternate methods to parse XML. Look forward to some of those alternatives in future posts!

As ever, don’t forget to clean up!

-- Drop the demo table
IF OBJECT_ID(N'tbl', N'U') IS NOT NULL
    DROP TABLE tbl;

Transactions Really Do Work

Believe it or not, transactions really do work — even when you forget to think about them! In the example below we’ll walk through how transaction rollback can prevent an attempt at error capture.

For this example, we will first create a table in which to store any errors we generate.

-- Create a table in which to store error messages
IF OBJECT_ID(N'tblLog') IS NOT NULL
    DROP TABLE tblLog;
CREATE TABLE tblLog (
    ID INT IDENTITY(1,1) NOT NULL
    , DateStamp DATETIME2(3) NOT NULL
    , ErrorNumber INT NOT NULL
    , ErrorMessage NVARCHAR(2048) NOT NULL
    )
;
ALTER TABLE tblLog
ADD CONSTRAINT DF_tblLog_DateStamp
DEFAULT (GETDATE()) FOR DateStamp
;
GO

With our error capture table created, let us now generate some errors! Specifically, we will produce an uncommitable transaction error so that we can eventually demonstrate how transaction rollback can affect error capture.

-- First, let's demonstrate capturing an uncommitable
-- transaction error
BEGIN TRY
    -- This is an uncommitable transaction because the
    -- table does not exist in the database
    DROP TABLE tblDoesNotExist;
END TRY
BEGIN CATCH
    -- Here is the catch where we capture the error
    INSERT INTO tblLog (ErrorNumber, ErrorMessage)
    VALUES (ERROR_NUMBER(), ERROR_MESSAGE());
END CATCH

Assuming tblDoesNotExist does not, in fact, exist in your database, the above statement will result in an entry to the error capture table, tblLog. Let’s take a look at the error:

-- Notice that we've captured the error
SELECT * FROM tblLog;

Yes, indeed, the error was captured; and it told us that the table does not exist and we thus cannot delete it. So far, so good. This information is useful for us to be able to troubleshoot the problem.  Let’s clear the error log now to reset it for our next example.

-- Clear the log table
DELETE FROM tblLog;

In the following example, we will take the exact statement we ran above and put it inside a transaction that includes rollback logic. Using rollback logic of this type is very common and is good practice. In fact, more robust validation and tracking can be implemented — but that is superfluous to the current demonstration.

-- Now, let's see how error capture can be lost in a transaction
BEGIN TRY
    BEGIN TRANSACTION;
        -- Here is the code from above, but run
        -- within a transaction with rollback logic
        BEGIN TRY
            -- This is an uncommitable transaction because the
            -- table does not exist in the database
            DROP TABLE tblDoesNotExist;
        END TRY
        BEGIN CATCH
            -- Here is the inner catch where we attempt to
            -- capture the error
            INSERT INTO tblLog (ErrorNumber, ErrorMessage)
            VALUES (ERROR_NUMBER(), ERROR_MESSAGE());
        END CATCH
    -- This commit will fail because the transaction is in
    -- an uncommitable state due to the attempted DROP TABLE
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    -- The catch is called; transaction must be rolled back
    ROLLBACK TRANSACTION;
    -- Capture the outer error
    INSERT INTO tblLog (ErrorNumber, ErrorMessage)
    VALUES (ERROR_NUMBER(), ERROR_MESSAGE());
END CATCH

At first glance, it might seem like we would only expect the same, one error in our log table. Afterall, the DROP TABLE command was in a TRY-CATCH block and so it was “handled”, right? And, if it was handled, then the outer CATCH block would not even execute. Right? Not exactly.

The DROP TABLE command will fail and will result in the error being written to the log table. However, because that error makes the transaction uncommitable, the COMMIT TRANSACTION also fails and pushes execution to the outer CATCH block. And, in the outer CATCH block, the transaction is rolled back. Which means writing the inner error to the log does NOT occur! Let’s see what the log shows:

-- Notice that only the outer error is captured; the
-- inner error capture is rolled back in the outer catch
SELECT * FROM tblLog;

While I hope this example is easy to follow (as examples should be), I trust that you can imagine how the situation will get confusing if, rather than a single DROP TABLE command, there were nested procedures with their own transactions and try-catch blocks. At any point in that potentially complex web of execution a statement might fail. And, despite our intention to capture the error at the point of failure, all of that error capture activity could get rolled-back and the end result might be a simple statement saying the transaction was uncommitable. A single error statement such as that provides virtually no insight into which part of the transaction was uncommitable — only that something within it was uncommitable.

What can be done about that? Pay close attention to where your transactions begin and end, and ensure you are not capturing errors where they will not be persisted!

Oh yes, and don’t forget to clean up after yourself! 🙂

-- Clean up
IF OBJECT_ID(N'tblLog') IS NOT NULL
    DROP TABLE tblLog;

Split Delimited String

Whether you like it or not, if you develop applications, at some point data will be collected as a delimited string and you’ll need to parse it within the database. While a database developer might prefer to work with sets or arrays, delimited strings are a fact of life; deal with it.

There are few good ways of taking a delimited string and breaking it up into discrete values, but, depending on the version of SQL you are using, some are better than others. If you’re running SQL 2005 or later, then you are in luck: your version supports the XML object type and you can make use of one of the most elegant ways of splitting a delimited string.

This method has been around for a while, but, nevertheless, it is one of my favorite utility functions and invariably finds its way into the model or master database on any server I have occasion to develop on.

-- Create a user-defined, table-valued function
CREATE FUNCTION utfSplitString (
    @nstrValues NVARCHAR(MAX) -- the delimited string of values
    , @strDelimiter CHAR(1) -- the delimiting character
    )
RETURNS @tblReturn TABLE (
    Value NVARCHAR(MAX)
    )
AS
BEGIN

    -- Convert the values to XML, replacing the delimiter with a node break
    DECLARE @x XML = CONVERT(
        XML
        , '<root><n>'
            + REPLACE(@nstrValues, @strDelimiter, '</n><n>')
            + '</n></root>'
        )
    ;

    -- Parse the XML into the return table using native XML methods
    INSERT INTO @tblReturn (Value)
    SELECT x.v.value('.', 'VARCHAR(MAX)')
    FROM @x.nodes('/root/n') x(v)
    ;

    -- Return the table records
    RETURN;

END

With the function created, you can now use it to break a delimited string into values. Simply pass in the string and the delimiter and off you go. As follows:

-- Use the function to break a string into values
SELECT Value
FROM utfSplitString(N'Hello,World,!', ',')
;

Does anyone have a more elegant way to split a delimited string? If so, please share it as a comment.

GOs and Semicolons

In every programming language there are phrases or elements that appear frequently and yet seem innocuous. Two such elements in T-SQL are the GO command and the semicolon.

The GO command tells the SQL server to execute/commit any preceding T-SQL statements. When there is only one statement in a batch, the GO command has no distinct effect. But, in multi-statement batches, the placement of GO could not only be important, it could, in fact, be required. Let’s take a look by walking through an example.

The following three statements, when executed as a batch (all at once), will fail with error message 111, stating that “CREATE VIEW” must be the first statement in a batch. In our example, it is not; it is the second statement, right after we specify the database context.

/* This fails because CREATE VIEW must be the first statement in a batch transaction */
USE [master];
CREATE VIEW vwTest AS SELECT 'Col1' = 1;
DROP VIEW vwTest;

To address this error, let’s put a GO after we specify the database context. While that resolves error message 111, running the statements again as a batch will result in error message 156, stating incorrect syntax near the keyword “DROP”.

/* This fails because CREATE VIEW must be committed before continuing in a batch transaction */
USE [master];
GO
CREATE VIEW vwTest AS SELECT 'Col1' = 1;
DROP VIEW vwTest;

And so, to resolve the second error, let’s put another GO after the CREATE VIEW statement to ensure it is committed before continuing on to the DROP statement. The following statements executed as a batch will succeed.

/* This completes successfully */
USE [master];
GO
CREATE VIEW vwTest AS SELECT 'Col1' = 1;
GO
DROP VIEW vwTest;

As the preceding examples demonstrate, the GO command is useful in batch operations to specify points in the batch where prior statements should be completed before continuing to later statements. There are other use-cases, mostly related to ensuring that newly-created objects are in place before subsequently referencing them the batch.

Let’s now shift to a more subtle issue: that of the semicolon. In T-SQL, a semicolon is designated as the statement terminator (similar to many other programming languages). In all SQL Server versions through 2012 (Denali), terminating a statement with a semicolon is optional in almost every instance. However, be aware that it will not be optional in a future release (or, at least so-says Microsoft at this point). Take a look at this TechNet article and see for yourself. On the other hand, the lack of semicolons in the wild is so pervasive that it will be difficult to enforce their usage anytime soon.

Aside from the potential that terminating T-SQL statements with a semicolon might be required in the nearish future, when else might it be needed? There are a few, specific instances, such as in the definition of a common table expression (CTE). A CTE is a T-SQL statement that resolves to a record set which, once defined, can be referenced multiple times, but only within the same scope. Additional information on CTEs is available here, and perhaps will be the subject of a future post within this forum.

Regardless, CTEs require that preceding T-SQL statements are terminated with a semicolon. Let’s take a look.

The following batch will fail with error message 319, stating incorrect syntax near the keyword “with”. The full message is actually uncharacteristically helpful and suggests that it might be caused by a missing semicolon.

/* This fails because common table expression definitions require prior statements to be semicolon terminated */
PRINT 'Start'
WITH cte AS (SELECT 'Col1' = 1)
SELECT * FROM cte
;

Now that we’ve seen the error message and confirmed the batch does not execute, let’s add the semicolon and see what happens.

/* This completes sucessfully */
PRINT 'Start'; -- add a semicolon here
WITH cte AS (SELECT 'Col1' = 1)
SELECT * FROM cte
;

As you will see, the previous batch executes successfully. While it’s not a good example of using a CTE, it does demonstrate that the semicolon was necessary for the CTE construct. At this point, let’s shift perspective and look at the use of semicolons a little bit differently: what if we provide too many semicolons? Let’s add one to the batch after defining the CTE and see what happens.

/* This fails because common table expressions are only valid within a statement (and a semicolon terminates a statement) */
PRINT 'Start';
WITH cte AS (SELECT 'Col1' = 1); -- add a semicolon here
SELECT * FROM cte
;

In this case, the batch fails with error message 1o2, incorrect syntax near “;”. That error is due to SQL not being able to resolve the “FROM cte” part of the statement. As mentioned previously, a CTE is only valid within the T-SQL statement in which it was defined. Since the semicolon ended the CTE definition statement, the following FROM reference to the CTE is invalid.

So, bottom line regarding semicolons: 1) get in the habit of using them, because they are eventually going to be required, and 2) by using them frequently you’ll develop a natural sense of where statements begin and end, and thus be better able to plan and manage transactions, batches, and, in general, good coding practice.

Find Databases w/o Backup

Imagine you have multiple databases on a server (maybe even hundreds or thousands?), each in full recovery mode, and a scheduled job that performs a log backup every fifteen minutes. All of a sudden the job starts failing. What could be the cause?

Very likely, a database was added to the server and has not yet had a full database backup performed. A log backup can only be performed after a full backup has been taken, since log backups are by definition incremental from a full backup.

How are we going to figure out which of our thousands of databases is missing a backup? Since it must have been recently created, we could simply sort sys.databases by created_date in descending order. However, there is no guarantee that the most recently created database (or any of the top X, for that matter) is, in fact, the one missing the full backup.

The following view will help us find the databases missing full backups. They will be identified by having a NULL value for backup_finish_date. This view extends that logic further to identify any database with an “old” backup as well. In this case, old is defined as over one day ago. Obviously that threshold should be adjusted to suit your environment.

-- Select databases without a current database backup
SELECT
    'DatabaseName' = db.name
    , 'LastBackup' = MAX(backup_finish_date)
FROM
    sys.databases AS db
    LEFT OUTER JOIN msdb.dbo.backupset AS bak
        ON db.name = bak.database_name
        AND bak.[type] = 'D' -- D = Database
WHERE db.recovery_model_desc <> N'SIMPLE'
GROUP BY db.name
HAVING
    ISNULL(MAX(backup_finish_date), '1900-01-01')
      < DATEADD(DAY, -1, GETDATE());

Also of note, in this use-case we are excluding databases with a “simple” recovery model since they would not be eligible for a log backup.

UPDATE 08 May 2015: There is a flaw in the query originally presented above. This flaw is exposed through the following use case. If a database is backed up and then deleted, without removing the backup history, the record of it’s back remains. If a new database is created with the same name as the deleted database, the query above will indicate that it has a backup when, in fact, there is none. This is due to the join between databases and backupsets having been solely on the database name field.

To correct this flaw, we can add a second join criterion for the database creation date, thereby ensuring we only consider backupsets that were created after the database itself was created. See updated query below.

-- Select databases without a current database backup
SELECT
    'DatabaseName' = db.name
    , 'LastBackup' = MAX(backup_finish_date)
FROM
    sys.databases AS db
    LEFT OUTER JOIN msdb.dbo.backupset AS bak
        ON db.name = bak.database_name
        AND DATEADD(SECOND
            , DATEDIFF(SECOND, CONVERT(DATE, db.create_date), db.create_date)
            , CONVERT(DATETIME2(0), CONVERT(DATE, db.create_date))
            ) <= bak.database_creation_date
        AND bak.[type] = 'D' -- D = Database
WHERE db.recovery_model_desc <> N'SIMPLE'
GROUP BY db.name
HAVING
    ISNULL(MAX(backup_finish_date), '1900-01-01')
      < DATEADD(DAY, -1, GETDATE());

Temp Table Scope

Temporary table scope is actually a little trickier than it might seem. First of all, Microsoft guidance is not particularly clear. Regarding local temp tables, BOL states: “[T]hey are visible only to the current connection for the user, and they are deleted when the user disconnects from the instance of SQL Server.”

Well, that’s not entirely accurate. “Connection” is misleading. Take a look at the following to see what I mean.

In the first run-through we’ll create a temp table the “normal” way; as you’ll see, everything works as expected.

-- If the temp table we intend to create already exists, drop it
IF OBJECT_ID(N'tempdb..#tbl') IS NOT NULL
    DROP TABLE #tbl;

-- Create a temp table
CREATE TABLE #tbl (ID INT);

-- Confirm the temp table was created
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

-- Drop the temp table we created
IF OBJECT_ID(N'tempdb..#tbl') IS NOT NULL
    DROP TABLE #tbl;

-- Confirm the temp table was dropped
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

So far so good. Now, let’s try the same thing again, but this time we’ll run the SQL from within an EXECUTE command.

-- Declare variables
DECLARE @nstrSQL NVARCHAR(MAX);

-- SQL to create the temp table and confirm it exists
SET @nstrSQL = N'CREATE TABLE #tbl (ID INT);
SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..#tbl'');'
;

-- Create the temp table and confirm creation
EXECUTE (@nstrSQL);

-- See if the temp table still exists
SELECT 'ObjectID' = OBJECT_ID(N'tempdb..#tbl');

What?! Within the EXECUTE statement context, the temp table was created and we confirmed that it existed in that context. However, outside the scope of the EXECUTE statement, the table is gone. Yet we are still in the same connection — which you can confirm by looking at the process id inside and outside of the EXECUTE statement context (simply add @@SPID to the SELECT statements).

Thankfully, the global temp table resolves that issue, so it’s not too much of an inconvenience. Except…

With a global temp table you have to be a little more careful about execution process overlap. In other words, if there is the potential for multiple processes to run the same procedure at close to the same time, and that procedure creates a temporary table of the same name, then the processes could collide and one process might use the table while the other process is still using it, potentially causing data corruption, or worse.

Well, that’s easy enough to mitigate as well. We just need to make the temp table name effectively unique such that each process will have a temp table of a different name. And that is easy to do with a GUID. When we create the global temp table, we’ll simply add a GUID to the name. That is sufficient to solve the unique name problem, but it does add the complication of having to share the GUID with any other process that might need to use the temp table. But, if a temp table is necessary, then passing an additional variable for the table name is likely a small price to pay to ensure data quality.

-- Declare variables
DECLARE
    @nstrGUID NCHAR(36) = NEWID()
    , @nstrSQL NVARCHAR(MAX)
;

-- SQL to create the temp table and confirm it exists
SET @nstrSQL = N'CREATE TABLE [##tbl_' + @nstrGUID + '] (ID INT);
SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..[##tbl_' + @nstrGUID + ']'');'
;

-- Create the temp table and confirm creation
EXECUTE (@nstrSQL);

-- SQL to re-confirm the temp table exists
SET @nstrSQL = N'SELECT ''ObjectID'' = OBJECT_ID(N''tempdb..[##tbl_'
    + @nstrGUID + ']'');'
;

-- See if the temp table still exists
EXECUTE (@nstrSQL);

Problem solved.