PyZine
 


Article Finder
People
Issue 2 - Revision 1  /   Published in 2002 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  Configuration Files Made Easy

  Jython & zxJDBC Database Programming

 
Extending Python with C: Part 2


  POOPy: Introduction to Objects in Python

  Scientific: Array Broadcasting in Numeric

 
 
 
     

Illustration by Lia Avant
Py Archive Article
Jython & zxJDBC Database Programming

Jython & zxJDBC Database Programming
- with Python
- - - - - - - - - - - -

By Brian Zimmer | Originally published in Py Issue 2

print

Jython is the Java implementation of the Python programming language. It is a powerful tool for parsing data, accessing databases and other day-to-day development tasks. One of the “batteries included” is an implementation of the Python DB API, zxJDBC. In this article we will parse some hypothetical triathlon results, insert them into MySQL and perform a couple data mining operations.

Let's break the task into two parts. First, figure out the general layout of the data file and determine the best approach for parsing the values. Secondly, establish a connection to MySQL (we assume you already have this database running) through zxJDBC and learn how to insert the data.

PARSING DATA

Below is a sample of the data:

NAME            BIKE   SWIM RUN  TOTAL
ARAGORN         12.23  0.84 2.75 15.82
GANDALF         11.67  1.32 2.73 15.72
MERRY            9.96  0.88 1.77 12.61
ARWEN            9.28  1.12 1.72 12.12
GALADRIEL        7.97  1.24 1.88 11.09
SAM              7.20  0.80 2.13 10.13
LEGOLAS          7.89  0.44 1.21  9.54

Analyzing the data, it's pretty clear it's fixed width columns and should be easily parsed but we need to decide on an approach. A table of the column indices and a function to handle the data should be more than sufficient for the data. Other solutions are definitely possible but this approach is potentially the simplest.

We need a function that can turn arbitrary data into a string and remove the extraneous whitespace fixed width columns exhibit:

def string(s):
return str(s).strip()

The second thing to do is build the table of column indices and functions to handle the data type:

# our parsing rules 
# (start, end, data converting function)
columns = {
     'name'  :(0, 16, string),
     'bike'  :(17, 22, float),
     'swim'  :(23, 27, float),
     'run'   :(28, 32, float),
     'total' :(33, 38, float)
     }

The structure we have chosen is pretty straightforward. It is a Python dictionary with the column name as the key and a tuple as the value. The tuple contains the starting index, the ending index and the function to convert the data string to the data type we're interested in storing in the database. If you're familiar with Python you'll note the use of the builtin function float as one handler. The only other handler we need is our whitespace-stripping string function.

Now that we have the parsing rules built, we can focus on the parsing. One of the most straightforward approaches is to read all the lines from the file, apply the rules to each and build a dictionary of columns and values from the result of calling the function on the string data. We can accomplish this with the function below:

def parse(fn):
'''
Parse the file and return 
a list of dictionaries containing
	the column/value pairs for the row.
	'''

	# open the file
	fp = open(fn, "r")

	# first read all the lines, 
# then strip any whitespace
	#  finally, filter any rows that have no length
	data = filter(lambda x: x, map(lambda x: x.strip(), fp.readlines()))

	# we don't need the header columns, 
# we already have that
	# information captured in the columns dictionary
	data = data[2:]

	# it's important to close what we open
	fp.close()

	# a list of all the processed rows
	results = []

	# iterate the data
	while data:
		# build our datastore
		ds = {}

		# get the row to be processed
		row = data[0]

		# iterate the columns
		for name, rules in columns.items():

			# unfold our rules
			start, end, func = rules

			# store the results of applying our rule
			ds[name] = func(row[start:end])

		# add the processed row to the rest
		results.append(ds)

		# since we just worked on this row 
# we can get rid of it
		del data[0]

	# return the results of parsing the file
	return results


The parse function works rather methodically through all the rows of the file and returns the parsed results. For each row it creates a new instance of a dictionary to store the value for each column. It then proceeds to slice the line into the fixed widths and apply the function from our simple rules table. The result of this evaluation is stored in the dictionary by column name. The function then returns the list of dictionaries representing database rows.

DATABASES

Now that we have the ability to generate the parsed data, let's start looking at how to create a database connection. The DB API details two primary classes, connections and cursors. We will first look at connections and how to open a connection and then look at cursors. There is no absolute method for establishing a connection through the API; there is a general convention but not all databases expect the same information. Since we're focusing on zxJDBC we'll look at how to establish a connection through it's API.

zxJDBC is a wrapper around JDBC and therefore needs a valid java.sql.Driver in order to operate. We will be using the Open Source MM.MySQL driver to connect to MySQL. In order to establish our connection we need the URL to the database, a username, password and the fully qualified name of the driver. A typical function for opening a connection is below:

def get_connection():
	"""Open and return a database connection."""

	# the parameters for establishing a connection
	url = "jdbc:mysql://192.168.1.102/racing"
	username = "bzimmer"
	password = None
	jdbcdriver = "org.gjt.mm.mysql.Driver"

	# establish the connection and return it
	from com.ziclix.python.sql import zxJDBC
	db = zxJDBC.connect(url, 
username, 
password, 
jdbcdriver)
	return db

The URL specifies it's for JDBC and tells the driver the hostname and database, in this case 192.168.1.102 and “racing” respectively. There is a username (bzimmer) but the database is not password protected. The final piece of data provided is the driver name. After making local variables for all the information we proceed to open a connection through the zxJDBC factory method connect. An open connection to the MySQL database is returned.

Once we've established the connection we can begin inserting the data we parsed earlier. The procedure for inserting the data into the database is, like the parsing, also straightforward. Since we have all the data we need we'll iterate the list, generate the SQL and execute it.

def store(data):
	"""Store the data in the database."""

	# get a new connection
	db = get_connection()

	# open a cursor so we can perform our work
	cursor = db.cursor()

	try:

		try:
			# since we have all new results, 
# drop the existing table
			cursor.execute("drop table triathlon")
		except:
			# it might not be there
			pass

		# create the table
		cursor.execute(ddl)

		for row in data:
			# build our sql dynamically
			items = row.items()

			keys = map(lambda x: x[0], items)
			values = map(lambda x: x[1], items)

			# this sql is for a prepared statement
			sql = "insert into triathlon (%s) values (%s)" \
				% (",".join(keys), ",".join("?" * len(values)))

			# execute the sql
			cursor.execute(sql, values)

	finally:
		# close what we open
		cursor.close()
		db.close()

The cursor object of the Python DB API was mentioned earlier and it is probably obvious from the store function that it is the workhorse in the API. To obtain a cursor invoke the appropriately named cursor method on an open connection instance. Any number of cursors can be acquired from an open connection and a cursor is reusable for any number of invocations until it, or the underlying connection, is closed.

After getting a cursor we drop the existing table if it exists to minimize any data problems. We wrap the statement in a try/catch block since we want to continue even if it fails, which it certainly will the first time we execute the function. Once the table is created from the DDL we are ready to start populating our new table.

Recall from earlier that data is a list of dictionaries, each dictionary containing the values for all the columns for one row in the database table “triathlon”. We set up a for loop to handle iterating the list. The first line of the for loop gets the column/data tuples for the row. The next two lines then pull the keys and values apart into separate lists for use in generating the SQL statement.

SQL can be in one of two forms, either parameterized or not. If parameterized, in place of having the string representation for each data value a ‘?’ is inserted in its place. This is useful because escaping SQL strings is sometimes quite difficult and parameterizing a statement sometimes allows for optimization at both the driver and database. Dissecting the line we find the string constant:

insert into triathlon (%s) values (%s)

requires two substitutions. The first substitution is the list of column names to be inserted separated by commas. The second substitution is a list of ‘?’ of length equal to the number of values to be inserted (which must equal the number of columns). The latter is accomplished through:

",".join("?" * len(values))

which would produce ‘?,?,?’ for an insertion of three values.

We finally arrive at the point where we can insert the data. We invoke the execute method on the cursor with the SQL statement we just created and the values we parsed out earlier. Had we created a fully qualified SQL string rather than chosen to use a prepared statement we would not need to pass the values to execute since the data would be embedded in the SQL string.

After handling all the data a finally block closes the cursor and connection. We're now done inserting the data but we still have to answer our question: which athlete is strongest on the bike relative to the other events?

PERFORMING QUERIES

To answer the question we need to perform a query. This is very similar to inserting data, the primary difference being we're interested in obtaining a result set from the database. The function below demonstrates how to perform the query and fetch the results. Once execute has been invoked the cursor stores the results of the query internally (depending on the implementation of the driver and/or cursor this might be on the server or the entire result set might have been copied to the client) and are available through the fetchXXX methods.

The three fetchXXX methods are: fetchone, fetchmany and fetchall. The methods fetchmany and fetchall return a list of tuples whereas fetchone returns a single tuple. The method fetchmany has an optional argument to specify how many rows to return in the list; fetchall returns all remaining rows. All three methods return None to indicate no more rows are available in the result set.


	# get a new connection
	db = get_connection()

	# open a cursor so we can answer our question
	cursor = db.cursor()

	try:
		# find the answer
		cursor.execute("select name, 
...	bike/total from triathlon order by 2 desc")
		for row in cursor.fetchall():
			print row

	finally:
		# close what we open
		cursor.close()
		db.close()

It is also possible to run queries in the interactive interpreter:

>>> import pyzine
>>> db = pyzine.get_connection()
>>> c = db.cursor()
>>> c.execute("select name, 
... run/total from triathlon order by 2 desc")
>>> while 1:
...  row = c.fetchone()
...  if row is None: break
...  print row
...
('SAM', 0.2102999985218048)
('ARAGORN', 0.1738000065088272)
('GANDALF', 0.1737000048160553)
('GALADRIEL', 0.16949999332427979)
('ARWEN', 0.14190000295639038)
('MERRY', 0.1404000073671341)
('LEGOLAS', 0.12680000066757202)
>>> c.close(); db.close()
>>>


Since we inserted the data into a database we can use the database's tools to run queries:

$ mysql -h 192.168.1.102 racing
	Welcome to the MySQL monitor.  Commands end with ; or \g.
	Your MySQL connection id is 17595 to server version: 3.23.41

	Type 'help;' or '\h' for help. Type '\c' to clear the buffer

	mysql> select name, bike/total from triathlon order by 2 desc;
	+-----------+------------+
	| name      | bike/total |
	+-----------+------------+
	| LEGOLAS   |     0.8270 |
	| MERRY     |     0.7898 |
	| ARAGORN   |     0.7731 |
	| ARWEN     |     0.7657 |
	| GANDALF   |     0.7424 |
	| GALADRIEL |     0.7187 |
	| SAM       |     0.7108 |
	+-----------+------------+
	
7 rows in set (0.00 sec)

	mysql>
PUTTING IT ALL TOGETHER
	def go(fn):
		# parse
		r = parse(fn)

		# store
		store(r)

		# query
		strongest_bike()

	$ jython
	Jython 2.1 on java1.3.0 (JIT: null)
	Type "copyright", "credits" or "license" for more information.
	>>> import pyzine
	>>> pyzine.go("pyzine.txt")
	('LEGOLAS', 0.8270000219345093)
	('MERRY', 0.7897999882698059)
	('ARAGORN', 0.7731000185012817)
	('ARWEN', 0.7656999826431274)
	('GANDALF', 0.7423999905586243)
	('GALADRIEL', 0.7186999917030334)
	('SAM', 0.7107999920845032)
	>>>

So it looks like all of our work paid off since we are now able to answer our question and the strongest relative cyclist is Legolas.

CONCLUSION

A number of topics were skipped in building these scripts, in particular: transactions, DataHandlers, more complex types like BLOBs and CLOBs, zxJDBC extensions, stored procedures, threads, dbexts, isql.

This particular article is Copyright © 2002 Brian Zimmer. All Rights Reserved.
Brian Zimmer


shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT