Internal Data Representation

3 10 2009

Have you ever wondered how your database represents the timestamp you just entered or how many bytes it takes to store a boolean or interval?

Oracle has a handy dump function that lets us peer into the internal representation of our data. Dump takes any type of input and returns a varchar2 describing the data type, length and the internal representation. Output might look like this:
‘Typ=96 Len=4: 84,101,99,104’

Here we will create a similar function in Postgres.

CREATE OR REPLACE FUNCTION pg_dump(
  value       IN  ANYELEMENT,
  p_type      OUT TEXT,
  p_oid       OUT INT,
  p_length    OUT INT,
  p_data      OUT TEXT,
  p_raw_data  OUT BYTEA
) RETURNS record AS
$$
DECLARE
  v_send_fn    TEXT;
BEGIN
  SELECT t.typname, t.oid, t.typsend::text
  INTO p_type, p_oid, v_send_fn
  FROM pg_type t
  WHERE t.oid = pg_typeof($1);

  IF v_send_fn IS NULL OR v_send_fn = '-' THEN
    RAISE EXCEPTION 'Found no send function for %', $1;
  ELSE
    EXECUTE 'SELECT ' || v_send_fn || '(' ||
      quote_literal($1) || '::' || p_type || ')'
    INTO p_raw_data;
  END IF;

  SELECT array_to_string(bytes, ','), length(p_raw_data)
  INTO p_data, p_length
  FROM (
    SELECT array(
      SELECT to_char(get_byte(p_raw_data, i), 'FM000')
      FROM generate_series(0, length(p_raw_data) - 1) i
    ) AS bytes
  ) sub;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE STRICT; 

There is a lot going on here, so lets step through the function. The function takes one input parameter of type anyelement. The first step is to determine what data type we are dealing with. This is done with the pg_typeof($1). We use that to look up the type name, oid and the send function from the pg_type table.

Now that we’ve got the send function, we call it on our in parameter to get the internal representation as a byte array. And the last step converts the bytea into a comma separated string of dec values for each byte.

The pg_dump function returns a record, which in my opinion is more useful than Oracle’s single varchar. But for completeness, lets add a function that returns a more Oracle like value.

CREATE OR REPLACE FUNCTION dump(anyelement)
RETURNS TEXT AS
$$
  SELECT 'Type: ' || p_type ||
    ', OID: ' || p_oid ||
    ', Length: ' || p_length ||
    ', Data: (' || p_data || ')'
  FROM pg_dump($1) d;
$$ LANGUAGE sql IMMUTABLE STRICT;
SELECT (pg_dump(current_date)).*;
p_type | p_oid | p_length |     p_data      |    p_raw_data    
--------+-------+----------+-----------------+------------------
 date   |  1082 |        4 | 000,000,013,235 | 000015\353

SELECT (pg_dump('Artacus'::text)).*;
 p_type | p_oid | p_length |           p_data            | p_raw_data 
--------+-------+----------+-----------------------------+------------
 text   |    25 |        7 | 065,114,116,097,099,117,115 | Artacus

SELECT dump(current_date);
                           dump                            
-----------------------------------------------------------
 Type: date, OID: 1082, Length: 4, Data: (000,000,013,235)

And finally, if it is easier for you to comprehend an integer value instead of a bytea, we’ll add a function to convert a short bytea to an int8. It seems like there should be a function that does this already. So if it already exists and I just missed it, please leave a comment.

CREATE OR REPLACE FUNCTION bytea_to_int(bytea)
RETURNS INT8 AS
$$ 
  SELECT sum(get_byte($1, i) * pow(256, length($1) - i -1))::int8
  FROM generate_series(0, length($1) - 1) i
  WHERE length($1) <= 8;
$$ LANGUAGE sql IMMUTABLE;
SELECT ts, bytea_to_int((pg_dump(ts)).p_raw_data) intval
FROM (
    SELECT '2009-01-01'::timestamp + interval '1 ms' * i ts
    FROM generate_series(1,10) i
) sub;
           ts            |     intval      
-------------------------+-----------------
 2009-01-01 00:00:00.001 | 284083200001000
 2009-01-01 00:00:00.002 | 284083200002000
 2009-01-01 00:00:00.003 | 284083200003000
 2009-01-01 00:00:00.004 | 284083200004000
 2009-01-01 00:00:00.005 | 284083200005000
 2009-01-01 00:00:00.006 | 284083200006000
 2009-01-01 00:00:00.007 | 284083200007000
 2009-01-01 00:00:00.008 | 284083200008000
 2009-01-01 00:00:00.009 | 284083200009000
 2009-01-01 00:00:00.01  | 284083200010000
Advertisements

Actions

Information

2 responses

5 01 2010
Pavel Stehule

It is hard core :).

Using this functionality in plpgsql – you have a courage – nice trick

Pavel

6 01 2010
Scott Bailey

Thanks Pavel. Actually you helped me out with part of it (pg_typeof) on pg_general 🙂

I know you wrote equivalent functionality in orafce. But this was more of a learning experience.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: