Tag Archives: query optimizer

Query Optimizer FAIL! 001

The query optimizer is tasked with converting a SQL statement into a list of commands that return a logical and consistent result. The optimizer part comes in as the database server attempts to determine the most efficient execution plan, looking for a balance between identifying the best plan and making a decision quickly. Much has been written elsewhere about the query optimizer, so I won’t review that information here. While the query optimizer is very impressive in its function, it has become a popular pursuit for database developers to find ways to trip-up the optimizer and have it chose an inefficient plan, or actually give up and not be able to find any execution plan at all.

For this post, let’s walk through an example of the latter, more extreme case — when the optimizer completely gives up and returns error Msg 8624: Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

As usual, we’ll start by setting context to a development database. In this case, I’m using a database called “devDB”, running on SQL Server 2008 R2, Service Pack 2.

-- specify the database context
USE devDB;
GO

Next we’ll create a reflexive linked server. Using a linked server is a contributing factor in tripping up the optimizer, but we need not actually be talking to a second server. Thus we’ll create a linked server that refers back to itself, allowing us to demonstrate the issue on a single server. A reflexive linked server is part of my “default” server build script as it comes in handy now and then. For other uses of a reflexive linked server, see my previous post on selecting from stored procedures.

-- create a reflexive linked server;
IF NOT EXISTS(SELECT * FROM [master]..sysservers WHERE srvname = N'mirror')
EXECUTE sp_addlinkedserver
    @server = N'mirror'
    , @srvproduct = N''
    , @provider = N'SQLOLEDB'
    , @datasrc = @@SERVERNAME --current server is data source for linked server
    ;
GO

Now let’s create a few tables to use in a query. Nothing fancy required here. We’ll create table “A” that has an ID field and a foreign key to reference a second table, table “B”. The “B” table will simply be a list of a few numbers in integer and string form. While we intend to have a relationship between the tables, there is no need to define the foreign key. You can if you want, but it will not affect the result. I have left it out because it is not necessary to reproduce the optimizer fail.

-- if table "A" already exists, drop it
IF OBJECT_ID(N'tblA', N'U') IS NOT NULL DROP TABLE tblA;
GO

-- create table "A"
CREATE TABLE tblA (
    ID INT
    , FK INT
    );

-- populate table "A"
INSERT INTO tblA (ID, FK)
VALUES
    (1, 1)
    ,(2, 1)
    ,(3, 2)
    ,(4, 3)
    ,(5, 4)
    ;

-- if table "B" already exists, drop it
IF OBJECT_ID(N'tblB', N'U') IS NOT NULL DROP TABLE tblB;
GO

-- create table "B"
CREATE TABLE tblB (
    ID INT
    , String VARCHAR(10)
    );

-- populate table "B"
INSERT INTO tblB (ID, String)
VALUES
    (1, 'One')
    ,(2, 'Two')
    ,(3, 'Three')
    ,(4, 'Four')
    ;
GO

With our base tables in place, let’s now define and populate a temporary table with a data set based on the joined base tables.

-- if the temp table already exists, drop it
IF OBJECT_ID(N'tempdb..#t', N'U') IS NOT NULL DROP TABLE #t;
GO

-- create a temp table to use in an update query
CREATE TABLE #t(
    RecordType INT
    , AID INT
    , BString VARCHAR(10)
    );

-- initialize the temp table
INSERT INTO #t(RecordType, AID)
VALUES
    (1, 1)
    ,(1, 2)
    ,(2, NULL)
    ;
GO

And now, just to double-check, let’s confirm the content of the temp table.

-- confirm content of temp table
SELECT * FROM #t;
GO

So far, so good! Next, let’s update the temp table. This will cause a couple things to happen in the background. We’ll talk about them later, but, for now, let’s just do the update.

-- update the temp table
UPDATE t
SET BString (
        SELECT TOP 1 String
        FROM mirror.devDB.dbo.tblB AS b
        WHERE b.ID = a.FK
        )
FROM
    #t AS t
    INNER JOIN mirror.devDB.dbo.tblA AS a
        ON t.AID = a.ID
WHERE t.RecordType = 1;
GO

Now, before anyone gets all excited, I am well aware there are more efficient ways to write the update statement. You forget the point! We’re not trying to write an efficient query… we are trying to write a query with which the optimizer doesn’t know what to do. 🙂

Once more, let’s confirm the content of the table to see that the update was successful.

-- confirm content of temp table
SELECT * FROM #t;
GO

Now comes the fun part. Let’s run the exact same query again and see what happens.

-- update the temp table
UPDATE t
SET BString (
        SELECT TOP 1 String
        FROM mirror.devDB.dbo.tblB AS b
        WHERE b.ID = a.FK
        )
FROM
    #t AS t
    INNER JOIN mirror.devDB.dbo.tblA AS a
        ON t.AID = a.ID
WHERE t.RecordType = 1;
GO

Pow! You’ll see that SQL Server was unable to build a query plan and thus could not execute the statement.

Msg 8624, Level 16, State 1, Line 1
Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

So, what the heck happened? To be honest, I don’t know. Whatever is going on inside the query optimizer, it couldn’t deal with the combination of factors. But why did the first update succeed and the second one fail? Statistics. Specifically, when the table was created and populated, SQL Server did not have any useful information about what data was inside the table. Thus, when it attempted to run the update, it just put the pieces together and did it. However, in so doing, it scanned the data in the tables and created statistics on the values in each column in both tables. When the update statement runs the second time, the statistics are available to the optimizer, but for whatever reason the statistics actually confound the optimizer and it is not able to come to a solution.

Having said that, let me be clear: the statistics are not the problem. At least they are not the only problem. To see what I mean, try a few of the following; any one of which is enough of a difference to allow the optimizer to figure out what is going on, create a query plan, and execute the update.

  1. Delete the statistics from tblA. With no statistics, the update succeeds. The statistics on tblB do not matter in this case.
  2. Remove the WHERE clause from the update statement. The clause is actually irrelevant because the INNER JOIN does effectively the same thing: limits the update to records with a non-null value in the “AID” column and a matching value in tblB.
  3. Do not use either or both of the linked server paths (replace them with a “normal”, local path instead). While the same table is referenced regardless of using the linked server or not, going through the linked server connection affects what visibility the query optimizer has to the target data.
  4. Move the sub-query from the SET statement into the FROM clause. That would be a more efficient query anyway, but, again, that wasn’t the point. That little bit of “poor query writing”, in combination with the other factors mentioned here, is sufficient to stump the optimizer.

Changing any one of the four factors above will result in the update query succeeding.

If you know what is going on inside the optimizer that causes this to fail, or if you know of a SQL Server update that resolves this “bug”, please post a comment and let us know.