Yesterday we lamented the absence of two of the basic descriptive statistics aggregate functions: mode and median. We then demonstrated how to build your own mode function. Today we’ll show how to create a median function.
As with our mode function, we’ll pass our record set of values to the function as XML with a node named “value”. The function will return a decimal value; however, the return is easily modified to meet your needs.
-- Set the database context
USE [master];
GO
-- If the function already exists, drop it
IF EXISTS (
SELECT *
FROM sys.objects
WHERE
[object_id] = OBJECT_ID(N'ufnMedian')
AND [type] IN ('FN', 'IF', 'TF', 'FS', 'FT'))
DROP FUNCTION ufnMedian;
GO
-- Configure environmental settings
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO
-- Create the function
CREATE FUNCTION ufnMedian (
@xml XML -- XML containing a single node named "value"
)
RETURNS DECIMAL(16,4) -- decimal return; redefine as needed
AS
BEGIN
-- Declare variables
DECLARE
@decReturn DECIMAL(16,4)
, @n INT = (SELECT @xml.value('count(/row/value)','int'))
, @x INT
;
-- Check to see if there is an even or odd number of records
IF @n % 2 = 0 -- Even; average the two middle values
BEGIN
-- Find the index of the lower middle value
SET @x = @n / 2.0;
-- Select the average of the lower and next highest values
SELECT @decReturn = AVG(value)
FROM (
SELECT
'value' = tbl.col.value('value[1]','float')
, 'n' = ROW_NUMBER()
OVER(ORDER BY tbl.col.value('value[1]','float'))
FROM @xml.nodes('row') tbl(col)
) AS dTbl
WHERE n = @x OR n = @x + 1;
END
ELSE -- Odd; return the middle value
BEGIN
-- Find the index of the middle value
SET @x = CEILING(@n / 2.0 );
-- Select the middle value
SELECT @decReturn = value
FROM (
SELECT
'value' = tbl.col.value('value[1]','float')
, 'n' = ROW_NUMBER()
OVER(ORDER BY tbl.col.value('value[1]','float'))
FROM @xml.nodes('row') tbl(col)
) AS dTbl
WHERE n = @x;
END
-- Return the result of the function
RETURN @decReturn;
END
GO
To call our median function, simply convert your values to XML and then pass it to the function. For example:
-- Pass the XML to the function
SELECT [master].dbo.ufnMedian(dTbl.x)
FROM (
SELECT value
FROM tbl
FOR XML PATH
) AS dTbl(x)
;
Note that the function assumes that the XML node is named “value”; thus you should alias your input values as such when you convert to XML.
Can anyone suggest a more elegant way to calculate median in TSQL? Please send in your comments!