One area where Postgres stands head and shoulders above Oracle is arrays. Oracle arrays/collections are quite cumbersome to use. But Postgres provides us with a simple to use and incredibly power tool in arrays.
Recently I had to do some ETL work were the source was a real mess. (There’s a real surprise.) Its always fun when the database designer graduated from the MS Excel school of normalization.
In this particular table, each row contains up to 8 test scores for 3 terms. So 1 row from the import table will need to be transformed into up to 24 rows which need mapped to the appropriate test and term, while eliminating null entries.
The solution is to use multi-dimentional arrays to map each entry. One dimension will map the term and the second dimension will map the test.
CREATE OR REPLACE VIEW etl_test_import AS
SELECT studentID, array[
array[test1_term1, test2_term1, test3_term1, test4_term1],
array[test1_term2, test2_term2, test3_term2, test4_term2],
array[test1_term3, test2_term3, test3_term3, test4_term3]
] as scores,
array['reading','writing','math','science'] AS name_map
You don’t need to save the above as a view, but its a little easier to understand than having it in a subselect. The scores field should look something like so:
Now that its organized by test and by term, we need to map them to our test codes. To do this, you use the idx() function I described earlier. This will give the index that corresponds to each test code.
SELECT studentID, t.test_code, term,
scores[term][idx(name_map, t.test_code)] AS score
FROM etl_test_import AS ti
JOIN test_definitions AS t ON 1=1
JOIN generate_series(1,3) AS term
WHERE scores[term][idx(name_map, t.test_code)] IS NOT NULL