Descriptive Statistics – Median

Yesterday we lamented the absence of two of the basic descriptive statistics aggregate functions: mode and median. We then demonstrated how to build your own mode function. Today we’ll show how to create a median function.

As with our mode function, we’ll pass our record set of values to the function as XML with a node named “value”. The function will return a decimal value; however, the return is easily modified to meet your needs.

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnMedian')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnMedian;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnMedian (
    @xml XML -- XML containing a single node named "value"
    )
RETURNS DECIMAL(16,4) -- decimal return; redefine as needed
AS
BEGIN
    -- Declare variables
    DECLARE
        @decReturn DECIMAL(16,4)
        , @n INT = (SELECT @xml.value('count(/row/value)','int'))
        , @x INT
    ;
    -- Check to see if there is an even or odd number of records
    IF @n % 2 = 0 -- Even; average the two middle values
    BEGIN
        -- Find the index of the lower middle value
        SET @x = @n / 2.0;
        -- Select the average of the lower and next highest values
        SELECT @decReturn = AVG(value)
        FROM (
                SELECT
                    'value' = tbl.col.value('value[1]','float')
                    , 'n' = ROW_NUMBER()
                        OVER(ORDER BY tbl.col.value('value[1]','float'))
                FROM @xml.nodes('row') tbl(col)
            ) AS dTbl
        WHERE n = @x OR n = @x + 1;
    END
    ELSE -- Odd; return the middle value
    BEGIN
        -- Find the index of the middle value
        SET @x = CEILING(@n / 2.0 );
        -- Select the middle value
        SELECT @decReturn = value
        FROM (
                SELECT
                    'value' = tbl.col.value('value[1]','float')
                    , 'n' = ROW_NUMBER()
                        OVER(ORDER BY tbl.col.value('value[1]','float'))
                FROM @xml.nodes('row') tbl(col)
            ) AS dTbl
        WHERE n = @x;
    END
    -- Return the result of the function
    RETURN @decReturn;
END
GO

To call our median function, simply convert your values to XML and then pass it to the function. For example:

-- Pass the XML to the function
SELECT [master].dbo.ufnMedian(dTbl.x)
FROM (
        SELECT value
        FROM tbl
        FOR XML PATH
    ) AS dTbl(x)
;

Note that the function assumes that the XML node is named “value”; thus you should alias your input values as such when you convert to XML.

Can anyone suggest a more elegant way to calculate median in TSQL? Please send in your comments!

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: