The Engine is the starting point for any SQLAlchemy application. It’s “home base” for the actual database and its DBAPI, delivered to the SQLAlchemy application through a connection pool and a Dialect, which describes how to talk to a specific kind of database/DBAPI combination.
The general structure is this:
+-----------+ __________
/---| Pool |---\ (__________)
+-------------+ / +-----------+ \ +--------+ | |
connect() <--| Engine |---x x----| DBAPI |---| database |
+-------------+ \ +-----------+ / +--------+ | |
\---| Dialect |---/ |__________|
+-----------+ (__________)
Where above, a Engine references both a Dialect and Pool, which together interpret the DBAPI’s module functions as well as the behavior of the database.
Creating an engine is just a matter of issuing a single call, create_engine():
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
The above engine invokes the postgresql dialect and a connection pool which references localhost:5432.
Note that the appropriate usage of create_engine() is once per particular configuration, held globally for the lifetime of a single application process (not including child processes via fork() - these would require a new engine). A single Engine manages connections on behalf of the process and is intended to be called upon in a concurrent fashion. Creating engines for each particular operation is not the intended usage.
The engine can be used directly to issue SQL to the database. The most generic way is to use connections, which you get via the connect() method:
connection = engine.connect()
result = connection.execute("select username from users")
for row in result:
print "username:", row['username']
connection.close()
The connection is an instance of Connection, which is a proxy object for an actual DBAPI connection. The returned result is an instance of ResultProxy, which acts very much like a DBAPI cursor.
When you say engine.connect(), a new Connection object is created, and a DBAPI connection is retrieved from the connection pool. Later, when you call connection.close(), the DBAPI connection is returned to the pool; nothing is actually “closed” from the perspective of the database.
To execute some SQL more quickly, you can skip the Connection part and just say:
result = engine.execute("select username from users")
for row in result:
print "username:", row['username']
result.close()
Where above, the execute() method on the Engine does the connect() part for you, and returns the ResultProxy directly. The actual Connection is inside the ResultProxy, waiting for you to finish reading the result. In this case, when you close() the ResultProxy, the underlying Connection is closed, which returns the DBAPI connection to the pool.
To summarize the above two examples, when you use a Connection object, it’s known as explicit execution. When you don’t see the Connection object, but you still use the execute() method on the Engine, it’s called explicit, connectionless execution. A third variant of execution also exists called implicit execution; this will be described later.
The Engine and Connection can do a lot more than what we illustrated above; SQL strings are only its most rudimentary function. Later chapters will describe how “constructed SQL” expressions can be used with engines; in many cases, you don’t have to deal with the Engine at all after it’s created. The Object Relational Mapper (ORM), an optional feature of SQLAlchemy, also uses the Engine in order to get at connections; that’s also a case where you can often create the engine once, and then forget about it.
SQLAlchemy includes many Dialect implementations for various backends; each is described as its own package in the sqlalchemy.dialects package. A SQLAlchemy dialect always requires that an appropriate DBAPI driver is installed.
The table below summarizes the state of DBAPI support in SQLAlchemy 0.6. The values translate as:
Driver | Connect string | Py2K | Py3K | Jython | Unix | Windows |
---|---|---|---|---|---|---|
DB2/Informix IDS | ||||||
ibm-db | thirdparty | thirdparty | thirdparty | thirdparty | thirdparty | thirdparty |
Firebird | ||||||
kinterbasdb | firebird+kinterbasdb* | yes | development | no | yes | yes |
Informix | ||||||
informixdb | informix+informixdb* | development | development | no | unknown | unknown |
MaxDB | ||||||
sapdb | maxdb+sapdb* | development | development | no | yes | unknown |
Microsoft Access | ||||||
pyodbc | access+pyodbc* | development | development | no | unknown | yes |
Microsoft SQL Server | ||||||
adodbapi | mssql+adodbapi | development | development | no | no | yes |
jTDS JDBC Driver | mssql+zxjdbc | no | no | development | yes | yes |
mxodbc | mssql+mxodbc | yes | development | no | yes with FreeTDS | yes |
pyodbc | mssql+pyodbc* | yes | development | no | yes with FreeTDS | yes |
pymssql | mssql+pymssql | yes | development | no | yes | yes |
MySQL | ||||||
MySQL Connector/J | mysql+zxjdbc | no | no | yes | yes | yes |
MySQL Connector/Python | mysql+mysqlconnector | yes | partial | no | yes | yes |
mysql-python | mysql+mysqldb* | yes | development | no | yes | yes |
OurSQL | mysql+oursql | yes | partial | no | yes | yes |
Oracle | ||||||
cx_oracle | oracle+cx_oracle* | yes | development | no | yes | yes |
Oracle JDBC Driver | oracle+zxjdbc | no | no | yes | yes | yes |
Postgresql | ||||||
pg8000 | postgresql+pg8000 | yes | yes | no | yes | yes |
PostgreSQL JDBC Driver | postgresql+zxjdbc | no | no | yes | yes | yes |
psycopg2 | postgresql+psycopg2* | yes | development | no | yes | yes |
pypostgresql | postgresql+pypostgresql | no | partial | no | yes | yes |
SQLite | ||||||
pysqlite | sqlite+pysqlite* | yes | yes | no | yes | yes |
sqlite3 | sqlite+pysqlite* | yes | yes | no | yes | yes |
Sybase ASE | ||||||
mxodbc | sybase+mxodbc | development | development | no | yes | yes |
pyodbc | sybase+pyodbc* | partial | development | no | unknown | unknown |
python-sybase | sybase+pysybase | partial | development | no | yes | yes |
Further detail on dialects is available at sqlalchemy.dialects as well as additional notes on the wiki at Database Notes
SQLAlchemy indicates the source of an Engine strictly via RFC-1738 style URLs, combined with optional keyword arguments to specify options for the Engine. The form of the URL is:
dialect+driver://username:password@host:port/database
Dialect names include the identifying name of the SQLAlchemy dialect which include sqlite, mysql, postgresql, oracle, mssql, and firebird. The drivername is the name of the DBAPI to be used to connect to the database using all lowercase letters. If not specified, a “default” DBAPI will be imported if available - this default is typically the most widely known driver available for that backend (i.e. cx_oracle, pysqlite/sqlite3, psycopg2, mysqldb). For Jython connections, specify the zxjdbc driver, which is the JDBC-DBAPI bridge included with Jython.
# postgresql - psycopg2 is the default driver.
pg_db = create_engine('postgresql://scott:tiger@localhost/mydatabase')
pg_db = create_engine('postgresql+psycopg2://scott:tiger@localhost/mydatabase')
pg_db = create_engine('postgresql+pg8000://scott:tiger@localhost/mydatabase')
# postgresql on Jython
pg_db = create_engine('postgresql+zxjdbc://scott:tiger@localhost/mydatabase')
# mysql - MySQLdb (mysql-python) is the default driver
mysql_db = create_engine('mysql://scott:tiger@localhost/foo')
mysql_db = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
# mysql on Jython
mysql_db = create_engine('mysql+zxjdbc://localhost/foo')
# mysql with pyodbc (buggy)
mysql_db = create_engine('mysql+pyodbc://scott:tiger@some_dsn')
# oracle - cx_oracle is the default driver
oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')
# oracle via TNS name
oracle_db = create_engine('oracle+cx_oracle://scott:tiger@tnsname')
# mssql using ODBC datasource names. PyODBC is the default driver.
mssql_db = create_engine('mssql://mydsn')
mssql_db = create_engine('mssql+pyodbc://mydsn')
mssql_db = create_engine('mssql+adodbapi://mydsn')
mssql_db = create_engine('mssql+pyodbc://username:password@mydsn')
SQLite connects to file based databases. The same URL format is used, omitting the hostname, and using the “file” portion as the filename of the database. This has the effect of four slashes being present for an absolute file path:
# sqlite://<nohostname>/<path>
# where <path> is relative:
sqlite_db = create_engine('sqlite:///foo.db')
# or absolute, starting with a slash:
sqlite_db = create_engine('sqlite:////absolute/path/to/foo.db')
To use a SQLite :memory: database, specify an empty URL:
sqlite_memory_db = create_engine('sqlite://')
The Engine will ask the connection pool for a connection when the connect() or execute() methods are called. The default connection pool, QueuePool, as well as the default connection pool used with SQLite, SingletonThreadPool, will open connections to the database on an as-needed basis. As concurrent statements are executed, QueuePool will grow its pool of connections to a default size of five, and will allow a default “overflow” of ten. Since the Engine is essentially “home base” for the connection pool, it follows that you should keep a single Engine per database established within an application, rather than creating a new one for each connection.
Custom arguments used when issuing the connect() call to the underlying DBAPI may be issued in three distinct ways. String-based arguments can be passed directly from the URL string as query arguments:
db = create_engine('postgresql://scott:tiger@localhost/test?argument1=foo&argument2=bar')
If SQLAlchemy’s database connector is aware of a particular query argument, it may convert its type from string to its proper type.
create_engine() also takes an argument connect_args which is an additional dictionary that will be passed to connect(). This can be used when arguments of a type other than string are required, and SQLAlchemy’s database connector has no type conversion logic present for that parameter:
db = create_engine('postgresql://scott:tiger@localhost/test', connect_args = {'argument1':17, 'argument2':'bar'})
The most customizable connection method of all is to pass a creator argument, which specifies a callable that returns a DBAPI connection:
def connect():
return psycopg.connect(user='scott', host='localhost')
db = create_engine('postgresql://', creator=connect)
Keyword options can also be specified to create_engine(), following the string URL as follows:
db = create_engine('postgresql://...', encoding='latin1', echo=True)
Options common to all database dialects are described at create_engine().
Recall from the beginning of this section that the Engine provides a connect() method which returns a Connection object. Connection is a proxy object which maintains a reference to a DBAPI connection instance. The close() method on Connection does not actually close the DBAPI connection, but instead returns it to the connection pool referenced by the Engine. Connection will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its __del__() method is called. When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced. With other Python implementations such as Jython, this is not so guaranteed.
The execute() methods on both Engine and Connection can also receive SQL clause constructs as well:
connection = engine.connect()
result = connection.execute(select([table1], table1.c.col1==5))
for row in result:
print row['col1'], row['col2']
connection.close()
The above SQL construct is known as a select(). The full range of SQL constructs available are described in SQL Expression Language Tutorial.
Both Connection and Engine fulfill an interface known as Connectable which specifies common functionality between the two objects, namely being able to call connect() to return a Connection object (Connection just returns itself), and being able to call execute() to get a result set. Following this, most SQLAlchemy functions and objects which accept an Engine as a parameter or attribute with which to execute SQL will also accept a Connection. This argument is named bind:
engine = create_engine('sqlite:///:memory:')
# specify some Table metadata
metadata = MetaData()
table = Table('sometable', metadata, Column('col1', Integer))
# create the table with the Engine
table.create(bind=engine)
# drop the table with a Connection off the Engine
connection = engine.connect()
table.drop(bind=connection)
Connection facts:
The Connection object provides a begin() method which returns a Transaction object. This object is usually used within a try/except clause so that it is guaranteed to rollback() or commit():
trans = connection.begin()
try:
r1 = connection.execute(table1.select())
connection.execute(table1.insert(), col1=7, col2='this is some data')
trans.commit()
except:
trans.rollback()
raise
The Transaction object also handles “nested” behavior by keeping track of the outermost begin/commit pair. In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.
# method_a starts a transaction and calls method_b
def method_a(connection):
trans = connection.begin() # open a transaction
try:
method_b(connection)
trans.commit() # transaction is committed here
except:
trans.rollback() # this rolls back the transaction unconditionally
raise
# method_b also starts a transaction
def method_b(connection):
trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction
try:
connection.execute("insert into mytable values ('bat', 'lala')")
connection.execute(mytable.insert(), col1='bat', col2='lala')
trans.commit() # transaction is not committed yet
except:
trans.rollback() # this rolls back the transaction unconditionally
raise
# open a Connection and call method_a
conn = engine.connect()
method_a(conn)
conn.close()
Above, method_a is called first, which calls connection.begin(). Then it calls method_b. When method_b calls connection.begin(), it just increments a counter that is decremented when it calls commit(). If either method_a or method_b calls rollback(), the whole transaction is rolled back. The transaction is not committed until method_a calls the commit() method. This “nesting” behavior allows the creation of functions which “guarantee” that a transaction will be used if one was not already available, but will automatically participate in an enclosing transaction if one exists.
Note that SQLAlchemy’s Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in Managing Transactions.
Transaction Facts:
The above transaction example illustrates how to use Transaction so that several executions can take part in the same transaction. What happens when we issue an INSERT, UPDATE or DELETE call without using Transaction? The answer is autocommit. While many DBAPIs implement a flag called autocommit, the current SQLAlchemy behavior is such that it implements its own autocommit. This is achieved by detecting statements which represent data-changing operations, i.e. INSERT, UPDATE, DELETE, etc., and then issuing a COMMIT automatically if no transaction is in progress. The detection is based on compiled statement attributes, or in the case of a text-only statement via regular expressions.
conn = engine.connect()
conn.execute("INSERT INTO users VALUES (1, 'john')") # autocommits
Recall from the first section we mentioned executing with and without a Connection. Connectionless execution refers to calling the execute() method on an object which is not a Connection, which could be on the Engine itself, or could be a constructed SQL object. When we say “implicit”, we mean that we are calling the execute() method on an object which is neither a Connection nor an Engine object; this can only be used with constructed SQL objects which have their own execute() method, and can be “bound” to an Engine. A description of “constructed SQL objects” may be found in SQL Expression Language Tutorial.
A summary of all three methods follows below. First, assume the usage of the following MetaData and Table objects; while we haven’t yet introduced these concepts, for now you only need to know that we are representing a database table, and are creating an “executable” SQL construct which issues a statement to the database. These objects are described in Database Meta Data.
meta = MetaData()
users_table = Table('users', meta,
Column('id', Integer, primary_key=True),
Column('name', String(50))
)
Explicit execution delivers the SQL text or constructed SQL expression to the execute() method of Connection:
engine = create_engine('sqlite:///file.db')
connection = engine.connect()
result = connection.execute(users_table.select())
for row in result:
# ....
connection.close()
Explicit, connectionless execution delivers the expression to the execute() method of Engine:
engine = create_engine('sqlite:///file.db')
result = engine.execute(users_table.select())
for row in result:
# ....
result.close()
Implicit execution is also connectionless, and calls the execute() method on the expression itself, utilizing the fact that either an Engine or Connection has been bound to the expression object (binding is discussed further in the next section, Database Meta Data):
engine = create_engine('sqlite:///file.db')
meta.bind = engine
result = users_table.select().execute()
for row in result:
# ....
result.close()
In both “connectionless” examples, the Connection is created behind the scenes; the ResultProxy returned by the execute() call references the Connection used to issue the SQL statement. When we issue close() on the ResultProxy, or if the result set object falls out of scope and is garbage collected, the underlying Connection is closed for us, resulting in the DBAPI connection being returned to the pool.
The “threadlocal” engine strategy is used by non-ORM applications which wish to bind a transaction to the current thread, such that all parts of the application can participate in that transaction implicitly without the need to explicitly reference a Connection. “threadlocal” is designed for a very specific pattern of use, and is not appropriate unless this very specfic pattern, described below, is what’s desired. It has no impact on the “thread safety” of SQLAlchemy components or one’s application. It also should not be used when using an ORM Session object, as the Session itself represents an ongoing transaction and itself handles the job of maintaining connection and transactional resources.
Enabling threadlocal is achieved as follows:
db = create_engine('mysql://localhost/test', strategy='threadlocal')
When the engine above is used in a “connectionless” style, meaning engine.execute() is called, a DBAPI connection is retrieved from the connection pool and then associated with the current thread. Subsequent operations on the Engine while the DBAPI connection remains checked out will make use of the same DBAPI connection object. The connection stays allocated until all returned ResultProxy objects are closed, which occurs for a particular ResultProxy after all pending results are fetched, or immediately for an operation which returns no rows (such as an INSERT).
# execute one statement and receive results. r1 now references a DBAPI connection resource.
r1 = db.execute("select * from table1")
# execute a second statement and receive results. r2 now references the *same* resource as r1
r2 = db.execute("select * from table2")
# fetch a row on r1 (assume more results are pending)
row1 = r1.fetchone()
# fetch a row on r2 (same)
row2 = r2.fetchone()
# close r1. the connection is still held by r2.
r1.close()
# close r2. with no more references to the underlying connection resources, they
# are returned to the pool.
r2.close()
The above example does not illustrate any pattern that is particularly useful, as it is not a frequent occurence that two execute/result fetching operations “leapfrog” one another. There is a slight savings of connection pool checkout overhead between the two operations, and an implicit sharing of the same transactional context, but since there is no explicitly declared transaction, this association is short lived.
The real usage of “threadlocal” comes when we want several operations to occur within the scope of a shared transaction. The Engine now has begin(), commit() and rollback() methods which will retrieve a connection resource from the pool and establish a new transaction, maintaining the connection against the current thread until the transaction is committed or rolled back:
db.begin()
try:
call_operation1()
call_operation2()
db.commit()
except:
db.rollback()
call_operation1() and call_operation2() can make use of the Engine as a global variable, using the “connectionless” execution style, and their operations will participate in the same transaction:
def call_operation1():
engine.execute("insert into users values (?, ?)", 1, "john")
def call_operation2():
users.update(users.c.user_id==5).execute(name='ed')
When using threadlocal, operations that do call upon the engine.connect() method will receive a Connection that is outside the scope of the transaction. This can be used for operations such as logging the status of an operation regardless of transaction success:
db.begin()
conn = db.connect()
try:
conn.execute(log_table.insert(), message="Operation started")
call_operation1()
call_operation2()
db.commit()
conn.execute(log_table.insert(), message="Operation succeeded")
except:
db.rollback()
conn.execute(log_table.insert(), message="Operation failed")
finally:
conn.close()
Functions which are written to use an explicit Connection object, but wish to participate in the threadlocal transaction, can receive their Connection object from the contextual_connect() method, which returns a Connection that is inside the scope of the transaction:
conn = db.contextual_connect()
call_operation3(conn)
conn.close()
Calling close() on the “contextual” connection does not release the connection resources to the pool if other resources are making use of it. A resource-counting mechanism is employed so that the connection is released back to the pool only when all users of that connection, including the transaction established by engine.begin(), have been completed.
So remember - if you’re not sure if you need to use strategy="threadlocal" or not, the answer is no ! It’s driven by a specific programming pattern that is generally not the norm.
Python’s standard logging module is used to implement informational and debug log output with SQLAlchemy. This allows SQLAlchemy’s logging to integrate in a standard way with other applications and libraries. The echo and echo_pool flags that are present on create_engine(), as well as the echo_uow flag used on Session, all interact with regular loggers.
This section assumes familiarity with the above linked logging module. All logging performed by SQLAlchemy exists underneath the sqlalchemy namespace, as used by logging.getLogger('sqlalchemy'). When logging has been configured (i.e. such as via logging.basicConfig()), the general namespace of SA loggers that can be turned on is as follows:
sqlalchemy.engine - controls SQL echoing. set to logging.INFO for SQL query output, logging.DEBUG for query + result set output.
sqlalchemy.dialects - controls custom logging for SQL dialects. See the documentation of individual dialects for details.
sqlalchemy.pool - controls connection pool logging. set to logging.INFO or lower to log connection pool checkouts/checkins.
For example, to log SQL queries as well as unit of work debugging:
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)
logging.getLogger('sqlalchemy.orm.unitofwork').setLevel(logging.DEBUG)
By default, the log level is set to logging.ERROR within the entire sqlalchemy namespace so that no log operations occur, even within an application that has logging enabled otherwise.
The echo flags present as keyword arguments to create_engine() and others as well as the echo property on Engine, when set to True, will first attempt to ensure that logging is enabled. Unfortunately, the logging module provides no way of determining if output has already been configured (note we are referring to if a logging configuration has been set up, not just that the logging level is set). For this reason, any echo=True flags will result in a call to logging.basicConfig() using sys.stdout as the destination. It also sets up a default format using the level name, timestamp, and logger name. Note that this configuration has the affect of being configured in addition to any existing logger configurations. Therefore, when using Python logging, ensure all echo flags are set to False at all times, to avoid getting duplicate log lines.
The logger name of instance such as an Engine or Pool defaults to using a truncated hex identifier string. To set this to a specific name, use the “logging_name” and “pool_logging_name” keyword arguments with sqlalchemy.create_engine().