Postgres allows you to define custom aggregates. In most cases the stock aggregate functions (AVG, MIN, MAX, SUM etc.) will meet our needs. But lets say we have a table of teachers joined to the classes they teach and we want to group by teacher and see what subject area the teacher teaches. So what results will you get if the teacher has 5 English classes and 1 Science class? If you use MAX() she’ll show up as a Science teacher. What we want is the subject area that shows up most often.
Enter the most() aggregate. We’ll use it to find the value that occurs most in our group. We need a few helper functions for this.
The first we need to gather every item of the group into an array using array_append().
Then we need to process that array and return the element that occurs most often.
CREATE OR REPLACE FUNCTION _final_most(anyarray) RETURNS anyelement AS $BODY$ SELECT a FROM unnest($1) a GROUP BY 1 ORDER BY COUNT(1) DESC LIMIT 1; $BODY$ LANGUAGE 'sql' IMMUTABLE;
Note that this makes use of the unnest() function added in 8.4. If to run in on an earlier version you’ll need to add it.
CREATE OR REPLACE FUNCTION unnest(anyarray)
RETURNS SETOF anyelement AS
$BODY$
SELECT $1[i] FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$
LANGUAGE 'sql' IMMUTABLE STRICT
Now lets glue these together for our custom aggregate.
CREATE AGGREGATE most(anyelement) (
SFUNC=array_append,
STYPE=anyarray,
FINALFUNC=_final_most,
INITCOND='{}'
);
Use it just like you would min() or max().
** Update **
Dave Fetter suggested that most() should be called mode() since that is the corresponding concept in mathematics and statistics. I was focusing on text when I wrote it, so it never occurred to me. So here are the final functions to define mode, median and range.
CREATE OR REPLACE FUNCTION _final_mode(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a
FROM unnest($1) a
GROUP BY 1 ORDER BY COUNT(1) DESC
LIMIT 1;
$BODY$
LANGUAGE 'sql' IMMUTABLE;
CREATE OR REPLACE FUNCTION _final_median(_numeric)
RETURNS numeric AS
$$
SELECT AVG(val)
FROM (
SELECT val
FROM unnest($1) val
ORDER BY 1
LIMIT 2 - MOD(array_upper($1, 1), 2)
OFFSET CEIL(array_upper($1, 1) / 2.0) - 1
) sub;
$$
LANGUAGE 'sql' IMMUTABLE;
CREATE OR REPLACE FUNCTION _final_range(_numeric)
RETURNS numeric AS
$$
SELECT MAX(val) - MIN(val)
FROM unnest($1) val;
$$
LANGUAGE 'sql' IMMUTABLE;
[...] feature in 8.4 (which should have been there all along). If you need to add it to 8.3 or below, see this [...]
Hi Scott,
This DOESN’T work if given one or more NULL values.
Any ideas?
-Evan
Example code (v8.2):
>Import your 3 functions
>create table NULLZ as (select 7 as value);
>insert into NULLZ values(NULL);
>insert into NULLZ values(NULL);
>insert into NULLZ values(NULL);
>select mode(value) from NULLZ;
ERROR: null array element where not supported (arrayfuncs.c:872)