ETL With Postgres Arrays

20 05 2009

One area where Postgres stands head and shoulders above Oracle is arrays. Oracle arrays/collections are quite cumbersome to use. But Postgres provides us with a simple to use and incredibly power tool in arrays.

Recently I had to do some ETL work were the source was a real mess. (There’s a real surprise.) Its always fun when the database designer graduated from the MS Excel school of normalization.

In this particular table, each row contains up to 8 test scores for 3 terms. So 1 row from the import table will need to be transformed into up to 24 rows which need mapped to the appropriate test and term, while eliminating null entries.

The solution is to use multi-dimentional arrays to map each entry. One dimension will map the term and the second dimension will map the test.

SELECT studentID, array[
array[test1_term1, test2_term1, test3_term1, test4_term1],
array[test1_term2, test2_term2, test3_term2, test4_term2],
array[test1_term3, test2_term3, test3_term3, test4_term3]
] as scores,
array['reading','writing','math','science'] AS name_map
FROM stupid_import_table;

You don’t need to save the above as a view, but its a little easier to understand than having it in a subselect. The scores field should look something like so:
Now that its organized by test and by term, we need to map them to our test codes. To do this, you use the idx() function I described earlier. This will give the index that corresponds to each test code.

SELECT studentID, t.test_code, term,
scores[term][idx(name_map, t.test_code)] AS score
FROM etl_test_import AS ti
JOIN test_definitions AS t ON 1=1
JOIN generate_series(1,3) AS term
WHERE scores[term][idx(name_map, t.test_code)] IS NOT NULL




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: