Working with Multiple Databases from Python

6 08 2013

As a database developer, I work with many different databases in the course of a week. Our ERP database is on Oracle and has over 4,000 tables and I spend the bulk of my time working with Oracle. But we also have databases on Postgres, MySQL, MS SQL Server and maybe even some MS Access lurking around. And when I have to script against the databases, or load a one-off spreadsheet the tool I typically reach for is Python.

But even though Python has a nice specification in DB API 2, there is enough variability between the different adapters that you can’t switch from an database adapter that you use all the time to one used infrequently without consulting Google. Is the connection parameter named ‘pass’ or ‘password’, ‘passwd’ or ‘pwd’? What was the MySQL adapter’s name and how many upper case letters were there? And how do you make a DSN for Oracle again?

Additionally, the extra features of each adapter vary quite a bit. PostgreSQL’s psycopg2 adapter has very nice dictionary and namedtuple cursor support. Oracle’s cx_Oracle adapter gives you exactly what the spec mandates, result sets returned as a list of tuples and nothing more.

And because I spend much more time working with Oracle than the others, I set out to level the playing field. So I wrote the Python DBMS (DataBases Made Simpler) module. I wanted memory efficient dictionary-style and namedtuple cursors but I needed to do it in a way that would work with any DB API 2.0 compatible adapter, not just cx_Oracle. And I wanted a uniform connection method regardless of the underlying database adapter. And while I’m at it, I might as well add database inspection so I can see what databases, schemas, tables, views and columns are available.

Because we don’t know what type of connection or cursor object we will be using, I couldn’t just subclass one of the adapter’s connection and cursor classes. But Python makes this very easy to handle using delegation. The connection and cursor objects store a reference to the actual adapter’s objects in _connection and _cursor respectively.

Lets see it in action:

>>> import dbms
>>> db = dbms.OraConnect('myUser', 'myPass', 'myInstance')
>>> cur = db.cursor()
>>> dir(db)
['action', 'autocommit', 'begin', 'cancel', 'changepassword', 'client_identifier',
'clientinfo', 'close', 'commit', 'current_schema', 'cursor', 'dsn', 'encoding',
'inputtypehandler', 'interface', 'maxBytesPerCharacter', 'module',
'nencoding', 'outputtypehandler', 'parmHelp', 'password', 'ping', 'placeholder',
'prepare', 'register', 'rollback', 'server', 'showColumns', 'showDatabases', 'showSchemas',
'showServerVersion', 'showTables', 'showViews', 'shutdown', 'startup', 'stmtcachesize',
'subscribe', 'tnsentry', 'unregister', 'username', 'version']
>>> db.interface.__name__

The call to dbms.OraConnect imports the adapter cx_Oracle, sets the environmental variable “NSL_LANG” (something that is easy to forget and causes lots of problems if not set) and returns a Connection object. For the most part it looks and behaves like a typical DB API 2 Connection object.  The first difference is that the Connection object keeps a reference to the adapter in the interface attribute.  This is important when debugging queries and writing dynamic queries.  The other significant difference is that there are several functions to help us find our way around the database.  Lets try them out on a Postgres database

>>> import dbms
>>> pg = dbms.PgConnect('myUser', 'myPass', 'myDatabase')
>>> pg.showServerVersion()
Record1(version='PostgreSQL 9.2.3, compiled by Visual C++ build 1600, 32-bit')
>>> pg.showSchemas()
>>> pg.showTables('information_schema')
Record3(schema='information_schema', table_name='sql_features')
Record3(schema='information_schema', table_name='sql_sizing_profiles')
>>> pg.showViews('information_schema')
Record4(schema='information_schema', name='_pg_foreign_data_wrappers')
Record4(schema='information_schema', name='views')
>>> pg.showColumns('information_schema.tables')
['table_catalog', 'table_schema', 'table_name', 'table_type', 'self_referencing_column_name', 'reference_generation', 'user_defined_type_catalog', 'user_defined_type_schema', 'user_defined_type_name', 'is_insertable_into', 'is_typed', 'commit_action']
>>> pg.showParmHelp()
Your adapter's parameter style is "pyformat"
'SELECT * FROM people WHERE last_name = %(name)s AND age > %(age)s', {'name': 'Smith', 'age': 30}
'SELECT * FROM people WHERE last_name = %s AND age > %s', ('Smith', 30)

Easy enough. You can see from the examples above that the results are being returned as namedtuples. Lets take a closer look at the Cursor object. The dbms module defines a memory efficient DictRow type that is a cross between a dict and a namedtuple. It is the default return type for dbms cursors and is subclassed for each recordset.

>>> cur = pg.cursor()
>>> cur.execute('SELECT * FROM information_schema.tables')
>>> row = cur.fetchone()
>>> print row
Record12{'table_catalog': 'pods', 'table_schema': 'pg_catalog', 'table_name': 'pg_type', 'table_type': 'BASE TABLE', 'self_referencing_column_name': None, 'reference_generation': None, 'user_defined_type_catalog': None, 'user_defined_type_schema': None, 'user_defined_type_name': None, 'is_insertable_into': 'YES', 'is_typed': 'NO', 'commit_action': None}
>>> row.table_name
>>> row['table_name']
>>> row[2]
>>> row[0:3]
['pods', 'pg_catalog', 'pg_type']

Overriding default constructors

11 05 2010

When you create a custom data type in Oracle, it creates a default constructor for you. But you don’t have access to modify this constructor. This can be a problem if you need some error checking or business logic. The documentation only mentions that it is possible to override the default constructor. However, every time I tried to do it I would get an error message complaining that two functions signatures matched this call. And because it didn’t know which function to use, it wouldn’t let me use either.

After reading and rereading the documentation, googling, and asking questions on help forums, I finally discovered the solution by accident. What they don’t tell you is that in order to override the default constructor, you not only have to match the position and type of attributes but also their NAMES. Once I changed the names to match that of the actual attributes, it worked like a charm.

Scheduling with Date Mod Part 2

20 10 2009

So as the title suggests, this is the second part of a series on scheduling. In the first part we did some basic setup. Now we are ready to dig in. The goal here is to be able to model any recurring pattern no matter the cycle.

The linchpin in this solution is to define an anchor date and the cycle length for every schedule. Given those two pieces of information, we can calculate the day of cycle for any given date using date mod. So lets create a couple of tables.
Read the rest of this entry »

Scheduling with Date Mod Part 1

19 10 2009

One question that seems to come up a lot is how to model recurring schedules. That’s understandable, because schedules can get out of hand pretty quickly.

We need a template so that we can define a schedule and then some how project it out on a repeating basis. The single week schedules are pretty easy. We just record the day of week and for each schedule and day we look up that day of week to get the shift for that day.
Read the rest of this entry »

Thinking in Sets of Sets

6 10 2009

Over the summer, I built the Chronos temporal toolkit for Oracle. And for the last while I’ve been porting it to Postgres and working on the documentation.

The SQL standard defines several data types to model instances of times. But rarely does anything happen instantaneously. Often what we need to model is a period of time; for instance an airline flight, a hotel stay, a bus route, an employee’s shift, a doctor’s appointment, a leave request. And yet there is no standard defined to handle this. So instead we use some combination of start and end time, an anchor time and interval, or an anchor time and a number representing seconds, minutes or days. Read the rest of this entry »

Internal Data Representation

3 10 2009

Have you ever wondered how your database represents the timestamp you just entered or how many bytes it takes to store a boolean or interval?

Oracle has a handy dump function that lets us peer into the internal representation of our data. Dump takes any type of input and returns a varchar2 describing the data type, length and the internal representation. Output might look like this:
‘Typ=96 Len=4: 84,101,99,104’
Read the rest of this entry »

Porting Postgres favorites to Oracle

30 07 2009

For the last month or so, I have been creating temporal extensions for both Postgres and Oracle. Of the two, the Postgres version was easier to implement because the array functionality in Postgres is the best of any dbms out there. So for other developers who split their time between Postgres and Oracle, here is how to create the equivalent of array_agg(), unnest() and generate_series() in Oracle.
Read the rest of this entry »