Tag Archives: sys.dataspaces

Sliding Window Partitioning

One of the useful enhancements to SQL Server has been the inclusion of native horizontal partitioning. In fact, in SQL Server 2008 all tables are partitioned — if not explicitly by the DBA then automatically by the system (with just a single partition, but nonetheless).

Let’s walk through one of the most basic uses of horizontal partitioning: archiving old data. This is a process whereby we remove the oldest records from a table and transfer them to an archive location or simply delete them, as the business requirement dictates. This use case is referred to as sliding window partitioning, because the range of records in our primary table slides through time into the future, only maintaining the current “X” number of periods of records in the main table.

We’ll start by configuring our demonstration environment. We’ll work within a database named “DB” and with a table of records called “tblFacts” within the “dbo” schema.

First, switch to the desired database context.

-- Set the database context

Now let’s ensure none of the objects we’ll create already exist in the database; if they do, we’ll drop them. For sliding window partitioning we will need a main table, a partition function, a partition scheme, and a staging table. I’ll explain more about each of these as we create them. But, for now, let’s just make sure our database is ready to have them created.

-- If our staging table already exists, drop it
IF OBJECT_ID(N'DB.dbo.tblFactsStaging') IS NOT NULL
    DROP TABLE DB.dbo.tblFactsStaging

-- If our main table already exists, drop it
    DROP TABLE DB.dbo.tblFacts

-- If our partition scheme already exists, drop it
IF EXISTS (SELECT * FROM sys.partition_schemes WHERE name = N'psFactsDate')

-- If our partition function already exists, drop it
IF EXISTS (SELECT * FROM sys.partition_functions WHERE name = N'pfFactsDate')

We’ll start by creating a partition function. There are three key parts to the function: the datatype we will partition on, the values that will serve as the partition boundaries, and a keyword that indicates whether the boundary values should be included in the partition to the left (below, less than) or right (above, greater than) of the value. For this example, we will partition our records based on a datetime stamp, and we will want the boundary value to serve as the minimum for the partition. This translates into a “right” boundary, meaning it will serve as the inclusive lower bound and the partition will include values that are equal to or greater than the boundary.  We’ll create daily partitions for three days in this example, which means we’ll have two boundaries; we’ll use 1 July 2012 and 2 July 2012 as the boundaries. That means the data will fall into three ranges: everything prior to 1 July 2012, between 1 July 2012 and 2 July 2012, and everything on or after 2 July 2012. For now we can assume that the outer partitions will also contain just a day of data, but we will enforce that assumption when it comes time to build tblFacts.

-- Create a partition function
FOR VALUES ('2012-07-01', '2012-07-02')

Next we’ll create a partition scheme. A partition scheme simply controls onto which filegroup the partitions defined by our partition function will fall. As mentioned previously, we specified two partition boundaries, which means we have three partitions; hence our partition scheme must specify at least three filegroups. If we specify more than three filegroups, SQL Server will remember the fourth group and use it for the next partition (if ever there is one); any additional filegroups specified will simply be forgotten and have no affect. For this example, we’ll put all the partitions onto our primary filegroup, though there are a few reasons you might not want to do that in the real world when working with large tables. (How’s that for a cliff-hanger? 🙂 Let’s discuss the topic in a future post!)

-- Create a partition scheme

It is time to create our main table. We’ll keep it simple and only include a unique identifier and our datetime stamp. Obviously you can include additional fields for data values if you’d like, but it is not relevant to the example. What is relevant, however, are two constraints: the primary key and a check constraint; and the data space on which we create the table.

In a non-partitioned table, the logical primary key can simply be the physical primary key; but with a partitioned table, the field or fields on which we are partitioning must be included in the primary key. The second constraint we’ll add is to ensure that our data falls within the ranges we expect for our partitioning. We made that assumption previously, but we’ll enforce it through our table definition.

Regarding the data space on which we create the table, normally a table is created on a single filegroup, often called PRIMARY; but a partitioned table is created on a partition scheme. The scheme, as you recall, specifies the filegroup(s) onto which each partition will fall, with ranges defined by the partition function. Also, when we create the table on a partition scheme, we have to specify which field in the table is to be evaluated against the partition function to determine the partition on which to store the record.

While that was a lot of explanation, you will see that the implementation is quite simple:

-- Create our primary table
CREATE TABLE dbo.tblFacts (
    , DateTimeStamp DATETIME NOT NULL
        ID ASC
        , DateTimeStamp ASC)
    , CONSTRAINT CK_tblFacts_DateRange CHECK (
        DateTimeStamp >= '2012-06-30'
        AND DateTimeStamp < '2012-07-03')
) ON psFactsDate(DateTimeStamp)

The last step in preparation is for us to have a staging table. We will use this table to move records out of the main table. It can also be used to transfer records into the main table, if that is ever required. Note that our staging table has the same structure as our main table, but we do not need to enforce the check constraint. We must, however, ensure that the staging table is created on the same filegroup of the partition we intend to work with from the main table. Since our partition scheme specified the PRIMARY filegroup for all partitions, we will use the same PRIMARY filegroup for our staging table.

-- Create our staging table
CREATE TABLE dbo.tblFactsStaging (
    , DateTimeStamp DATETIME NOT NULL
        ID ASC
        , DateTimeStamp ASC)

Now we’ll populate our main table with data. For this example I want to add a record for every minute of three days: 30 June 2012 through 2 July 2012. To do that I’ll use my numbers function to get the minutes per day and link them to my dates view to get the three dates I want. If you don’t have those record sets already, you can click the links as needed: utfNumbers and vwDates.

-- Populate our main table with sample data
INSERT INTO dbo.tblFacts (DateTimeStamp)
SELECT DATEADD(MINUTE, nums.n - 1, CONVERT(DATETIME, dates.CalendarDate))
    dbo.vwDates AS dates
    , dbo.utfNumbers(1440) AS nums -- minutes per day
WHERE dates.CalendarDate BETWEEN '2012-06-30' AND '2012-07-02'

With our sample data populated, we can now see some interesting information about our partitioned table and data. The following query provides a good summary. While our sample table and data is rather benign, running this query against a more complicated table and/or against the entire database (by removing the WHERE clause) can provide a comprehensive overview of the partitioning status.

Of particular note, this query shows the table name, the index name, which filegroup or partition scheme the table data is stored on and, if stored on a partition scheme, the filegroup on which each partition in the scheme is stored, and the number of data rows in each partition.

-- View information about the partitions and partitioned data
    N'DatabaseName' = DB_NAME()
    , N'SchemaName' = s.name
    , N'TableName' = o.name
    , N'IndexName' = i.name
    , N'IndexType' = i.type_desc
    , N'PartitionScheme' = ps.name
    , N'DataSpaceName' = ds.name
    , N'DataSpaceType' = ds.type_desc
    , N'PartitionFunction' = pf.name
    , N'PartitionNumber' = dds.destination_id
    , N'BoundaryValue' = prv.value
    , N'RightBoundary' = pf.boundary_value_on_right
        -- 1 = TRUE (values are less than boundary)
        -- 0 = FALSE (values are less than or equal to boundary)
    , N'PartitionFileGroup' = ds2.name
    , N'RowsOfData' = p.[rows]
    sys.objects AS o
    INNER JOIN sys.schemas AS s
        ON o.[schema_id] = s.[schema_id]
    INNER JOIN sys.partitions AS p
        ON o.[object_id] = p.[object_id]
    INNER JOIN sys.indexes AS i
        ON p.[object_id] = i.[object_id]
        AND p.index_id = i.index_id
    INNER JOIN sys.data_spaces AS ds
        ON i.data_space_id = ds.data_space_id
    LEFT OUTER JOIN sys.partition_schemes AS ps
        ON ds.data_space_id = ps.data_space_id
    LEFT OUTER JOIN sys.partition_functions AS pf
        ON ps.function_id = pf.function_id
    LEFT OUTER JOIN sys.partition_range_values AS prv
        ON pf.function_id = prv.function_id
        AND p.partition_number = prv.boundary_id
    LEFT OUTER JOIN sys.destination_data_spaces AS dds
        ON ps.data_space_id = dds.partition_scheme_id
        AND p.partition_number = dds.destination_id
    LEFT OUTER JOIN sys.data_spaces AS ds2
        ON dds.data_space_id = ds2.data_space_id
    s.name = N'dbo' -- schema name
    AND o.name = N'tblFacts' -- table name
    , SchemaName
    , TableName
    , IndexName
    , PartitionNumber

So, here comes the sliding window part. Let’s go ahead and remove the data from the back (oldest partition) of the table and then add a new partition to the front (newest data side) of the table to permit more information to be added. In the process we’ll adjust our check constraint to allow that new data to be added, as well as prevent older data from being inserted after we’ve removed the partition that was intended to store it.

-- Confirm data row count in both tables before changes are made
    N'CountOfTblAuditStaging' = (SELECT COUNT(*) FROM dbo.tblFactsStaging)
    , N'CountOfTblAudit' = (SELECT COUNT(*) FROM dbo.tblFacts)

-- Switch the oldest data out of main table into the staging table
ALTER TABLE dbo.tblFacts
SWITCH PARTITION 1 TO dbo.tblFactsStaging

-- Remove the now-empty oldest partition from the main table
MERGE RANGE ('2012-07-01')

-- Make a new partition filegroup available to the partition scheme

-- Add a new partition (which will use the filegroup identified as "next used")
SPLIT RANGE ('2012-07-03')

-- Alter the check constraint to account for our new range
ALTER TABLE dbo.tblFacts
ALTER TABLE dbo.tblFacts
    DateTimeStamp >= '2012-07-01'
    AND DateTimeStamp < '2012-07-04')

-- Confirm data row count in both tables after making changes
    N'CountOfTblAuditStaging' = (SELECT COUNT(*) FROM dbo.tblFactsStaging)
    , N'CountOfTblAudit' = (SELECT COUNT(*) FROM dbo.tblFacts)

Re-execute the partition information query and you should see that the partition boundaries have changed, as well as the count of the rows of data in each partition. The new partitions one (1) and two (2) are the old partitions two (2) and three (3). And the new partition we added is now partition three (3) and contains no data rows.

The data in the staging table is now free to be manipulated however is needed. While the benefits to partitioning on this small example table are negligible, if you ever need to work with millions of rows at a time, partitioning is a significant improvement. Why? Because working with partitions of data can be a metadata-only operation. We did not have to read records from the main table and write them into the staging table. Doing so could have taken a long time (had we been working with millions of records) and could have locked the main table during the process. By being able to accomplish the same thing working only with the metadata, the operation completes virtually instantly.