Working with Multiple Databases from Python

6 08 2013

As a database developer, I work with many different databases in the course of a week. Our ERP database is on Oracle and has over 4,000 tables and I spend the bulk of my time working with Oracle. But we also have databases on Postgres, MySQL, MS SQL Server and maybe even some MS Access lurking around. And when I have to script against the databases, or load a one-off spreadsheet the tool I typically reach for is Python.

But even though Python has a nice specification in DB API 2, there is enough variability between the different adapters that you can’t switch from an database adapter that you use all the time to one used infrequently without consulting Google. Is the connection parameter named ‘pass’ or ‘password’, ‘passwd’ or ‘pwd’? What was the MySQL adapter’s name and how many upper case letters were there? And how do you make a DSN for Oracle again?

Additionally, the extra features of each adapter vary quite a bit. PostgreSQL’s psycopg2 adapter has very nice dictionary and namedtuple cursor support. Oracle’s cx_Oracle adapter gives you exactly what the spec mandates, result sets returned as a list of tuples and nothing more.

And because I spend much more time working with Oracle than the others, I set out to level the playing field. So I wrote the Python DBMS (DataBases Made Simpler) module. I wanted memory efficient dictionary-style and namedtuple cursors but I needed to do it in a way that would work with any DB API 2.0 compatible adapter, not just cx_Oracle. And I wanted a uniform connection method regardless of the underlying database adapter. And while I’m at it, I might as well add database inspection so I can see what databases, schemas, tables, views and columns are available.

Because we don’t know what type of connection or cursor object we will be using, I couldn’t just subclass one of the adapter’s connection and cursor classes. But Python makes this very easy to handle using delegation. The connection and cursor objects store a reference to the actual adapter’s objects in _connection and _cursor respectively.

Lets see it in action:

>>> import dbms
>>> db = dbms.OraConnect('myUser', 'myPass', 'myInstance')
>>> cur = db.cursor()
>>> dir(db)
['action', 'autocommit', 'begin', 'cancel', 'changepassword', 'client_identifier',
'clientinfo', 'close', 'commit', 'current_schema', 'cursor', 'dsn', 'encoding',
'inputtypehandler', 'interface', 'maxBytesPerCharacter', 'module',
'nencoding', 'outputtypehandler', 'parmHelp', 'password', 'ping', 'placeholder',
'prepare', 'register', 'rollback', 'server', 'showColumns', 'showDatabases', 'showSchemas',
'showServerVersion', 'showTables', 'showViews', 'shutdown', 'startup', 'stmtcachesize',
'subscribe', 'tnsentry', 'unregister', 'username', 'version']
>>> db.interface.__name__

The call to dbms.OraConnect imports the adapter cx_Oracle, sets the environmental variable “NSL_LANG” (something that is easy to forget and causes lots of problems if not set) and returns a Connection object. For the most part it looks and behaves like a typical DB API 2 Connection object.  The first difference is that the Connection object keeps a reference to the adapter in the interface attribute.  This is important when debugging queries and writing dynamic queries.  The other significant difference is that there are several functions to help us find our way around the database.  Lets try them out on a Postgres database

>>> import dbms
>>> pg = dbms.PgConnect('myUser', 'myPass', 'myDatabase')
>>> pg.showServerVersion()
Record1(version='PostgreSQL 9.2.3, compiled by Visual C++ build 1600, 32-bit')
>>> pg.showSchemas()
>>> pg.showTables('information_schema')
Record3(schema='information_schema', table_name='sql_features')
Record3(schema='information_schema', table_name='sql_sizing_profiles')
>>> pg.showViews('information_schema')
Record4(schema='information_schema', name='_pg_foreign_data_wrappers')
Record4(schema='information_schema', name='views')
>>> pg.showColumns('information_schema.tables')
['table_catalog', 'table_schema', 'table_name', 'table_type', 'self_referencing_column_name', 'reference_generation', 'user_defined_type_catalog', 'user_defined_type_schema', 'user_defined_type_name', 'is_insertable_into', 'is_typed', 'commit_action']
>>> pg.showParmHelp()
Your adapter's parameter style is "pyformat"
'SELECT * FROM people WHERE last_name = %(name)s AND age > %(age)s', {'name': 'Smith', 'age': 30}
'SELECT * FROM people WHERE last_name = %s AND age > %s', ('Smith', 30)

Easy enough. You can see from the examples above that the results are being returned as namedtuples. Lets take a closer look at the Cursor object. The dbms module defines a memory efficient DictRow type that is a cross between a dict and a namedtuple. It is the default return type for dbms cursors and is subclassed for each recordset.

>>> cur = pg.cursor()
>>> cur.execute('SELECT * FROM information_schema.tables')
>>> row = cur.fetchone()
>>> print row
Record12{'table_catalog': 'pods', 'table_schema': 'pg_catalog', 'table_name': 'pg_type', 'table_type': 'BASE TABLE', 'self_referencing_column_name': None, 'reference_generation': None, 'user_defined_type_catalog': None, 'user_defined_type_schema': None, 'user_defined_type_name': None, 'is_insertable_into': 'YES', 'is_typed': 'NO', 'commit_action': None}
>>> row.table_name
>>> row['table_name']
>>> row[2]
>>> row[0:3]
['pods', 'pg_catalog', 'pg_type']