SQLAlchemy: Connecting to pre-existing databases & table joins!

SqlAlchemy is an object-relational mapper (ORM), which means that it takes SQL constructs and makes them Python objects which can be manipulated natively. Most of the time the same code can be recycled to use multiple database back-ends (unless you use back-end specific features).

I will be using Microsoft SQL Server for this particular example and show you how to use a technique called database reflection to get table information from a database schema when the database and tables already exist. For this you will need SQLAlchemy and pymssql installed.

#!C:\python27\python.exe
# -*- coding: utf-8 -*-

__author__ = "Jamie Ivanov"

from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine, MetaData, Table

def main():
engine = create_engine('mssql+pymssql://server/Database', echo=False)

table1meta = MetaData(engine)
table1 = Table('Customers', table1meta, autoload=True)

table2meta = MetaData(engine)
table2 = Table('Orders', table2meta, autoload=True)

table3meta = MetaData(engine)
table3 = Table('Invoices', table3meta, autoload=True)

Session = sessionmaker(bind=engine)
session = Session()

results = session.query(table1, table2, table3).filter(table1.columns.CustomerId == 13002).join(table2, table1.columns.CustomerId == table2.columns.CustomerId).join(table3, table1.columns.CustomerId == table3.columns.CustomerId)

cols = [c["name"] for c in results.column_descriptions]

for r in results.all():
print(dict(zip(cols,r)))

session.close()

if __name__ == '__main__':
main()

Which returns the results:

{'OrderId': 1, 'CustomerId': 13002L, 'DateCreated': datetime.datetime(2015, 8, 4, 9, 1, 9, 667000), 'Paid': True, 'PrintedInvoice': u'invoice_1.pdf', 'Invoice': 1, 'Name': u'Jamie Ivanov'}
{'OrderId': 2, 'CustomerId': 13002L, 'DateCreated': datetime.datetime(2015, 8, 4, 9, 2, 9, 807000), 'Paid': False, 'PrintedInvoice': u'invoice_2.pdf', 'Invoice': 2, 'Name': u'Jamie Ivanov'}

So what’s happening here?

First, we have to create the engine which will be used to create the actual database connection. Here, I’m using the “mssql+pymssql” driver but you can refer to the SQLAlchemy Engine configuration page to see examples on how to connect to other database back-ends. The “echo” parameter will echo all of the SQL queries to STDOUT which can be useful for debugging.

engine = create_engine('mssql+pymssql://server/Database', echo=False)

Next I setup a Table() object for each table I will be using using with the reflection technique. Typically a metadata object will be created that contains the table information (columns, datatypes, etc) when creating a new Table() object but this information is going to be pulled from the table so I need an empty object to use with the table. When creating the metadata object, I pass in the engine that was just created so it knows where it should be pulling the information from. The new metadata object will be passed into the Table() object as the second argument and setting “autoload” to True will populate the metadata for the table. After this you could query the table object for the columns and get a list of dictionaries containing column name, datatype, and etc.

table1meta = MetaData(engine)
table1 = Table('Customers', table1meta, autoload=True)

Then I configure the session which will do the dirty work when communicating with the database. This will handle all of the queries, commits, flushes, etc. First use sessionmaker() to create the session class that will be used throughout the code then the actual session that will used for the particular transaction. Of course we need to configure our session class by passing in the engine to sessionmaker().

Session = sessionmaker(bind=engine)
session = Session()

Now that a session has been setup, I can begin working on the query with joins. The session object supports .query() and .filter() but beyond that I will use a series of .join()s to do an inner-joins. There are several ways that this can be done and I’m not saying that this is the correct way or the best way or the only way. The first part of the query will be rather simple where I query the 3 tables and filter based on a specific customer ID. Then I setup the the joins with the table object created above (first argument) and the join criteria (second argument).

results = session.query(table1, table2, table3).filter(table1.columns.CustomerId == 13002).join(table2, table1.columns.CustomerId == table2.columns.CustomerId).join(table3, table1.columns.CustomerId == table3.columns.CustomerId)

Please note that there is no .all() or .one() at the end of the query because I don’t want to return the results just yet but rather I use list comprehension to generate a list of columns by iterating over the “column_descriptions” attribute. This attribute is a list of dictionaries containing each column name, datatype, etc.

cols = [c["name"] for c in results.column_descriptions]

Now for the actual results, I will use the list of column names from the list above and build a dictionary of the results. Each row is a set of results and Python’s zip() functionality will build tuples of each position in the column list and associate them with the corresponding position in the result set. From there, each tuple will be turned into a key/value pair in a dictionary.

for r in results.all():
print(dict(zip(cols,r)))

And finally, close the session. It’s not necessary but should be a good habit to gracefully shut down database connection. No content was changed within the database so there is no need to commit() or flush() but that should be food for thought.

session.close()

This is by no means comprehensive but just a journey through my adventures of tinkering with SQLAlchemy and I hope this helps get you started.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>