Descriptive Statistics – Mode

For those who have attempted rudimentary analysis through TSQL, I am confident that you found a conspicuous absence in the built-in aggregate functions. Specifically, where are the mode and median functions? The mean, summation, and count functions exist, and even standard deviation and variance are there, in both sample and population flavors. But where are median and mode?!

It would be great to be able to define the two missing statistical aggregates as TSQL aggregate functions; unfortunately, that is not permitted. Only Common Language Runtime (CLR) assemblies are allowed to be a function of the aggregate type. However, you can create a scalar value TSQL function that will perform the calculation with roughly the same usage.

Carrying on the XML theme from recent posts, our mode function will accept an array of values as XML.  We’ll then shred it and aggregate it, returning a string result. Why a string? Because mode can return multiple values when there is a tie. After all, in a uniform distribution, each value is equally probable!

Behold, the mode function in TSQL:

-- Set the database context
USE [master];
GO

-- If the function already exists, drop it
IF EXISTS (
    SELECT *
    FROM sys.objects
    WHERE
        [object_id] = OBJECT_ID(N'ufnMode')
        AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnMode;
GO

-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO

-- Create the function
CREATE FUNCTION ufnMode (
    @xml XML -- XML containing a single node named "value"
    )
RETURNS VARCHAR(200) -- string return because mode could be multiple values
AS
BEGIN
    -- Declare variables
    DECLARE @strReturn VARCHAR(200);
    -- Calculate the mode
    SELECT @strReturn = (
            -- Concatenate the value(s) to a string
            SELECT CONVERT(NVARCHAR(20), dTbl2.value) + '; ' AS [text()]
            FROM (
                    -- Find the most common value(s)
                    SELECT TOP 1 WITH TIES dTbl.value
                    FROM (
                            -- Shred the XML to a table
                            SELECT 'value' =
                                tbl.col.value('value[1]','float')
                            FROM @xml.nodes('row') tbl(col)
                        ) AS dTbl
                    GROUP BY dTbl.value
                    ORDER BY COUNT(*) DESC
                ) AS dTbl2
            ORDER BY dTbl2.value
            FOR XML PATH('')
        )
    ;
    -- Remove the trailing value delimiter
    SET @strReturn = LEFT(@strReturn, LEN(@strReturn) - 1);
    -- Return the result of the function
    RETURN @strReturn;
END
GO

To call the function, simply convert your values to XML and then pass it to the function. For example:

-- Pass the XML to the function
SELECT [master].dbo.ufnMode(dTbl.x)
FROM (
        SELECT value
        FROM tbl
        FOR XML PATH
    ) AS dTbl(x)
;

Note that the function assumes that the XML node is named “value”; thus you should alias your input values as such when you convert to XML. Alternately, you could extend the function to accept any name for the node.

Can anyone suggest a more elegant way to calculate mode in TSQL? Please send in your comments!

And what about the median function? Stay tuned for the next post! 😉

Advertisements
Post a comment or leave a trackback: Trackback URL.

Trackbacks

  • […] two of the basic descriptive statistics aggregate functions: mode and median. We then demonstrated how to build your own mode function. Today we’ll show how to create a median […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: