PyZine
 


Article Finder
People
Issue 5 - Revision 7  /   April 20, 2004 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  RSOAP

  Simple Code Generation

  4ss

  Pyro

  PyCon 2004

  Kaa & Firedrop

  XML-RPC for Python

  Applied XML-RPC

 
 
 
     

Illustration by Lia Avant
article
Simple Code Generation

Simple Code Generation
- with Python
- - - - - - - - - - - -

By Ollie Rutherfurd | February 3, 2003

print
Introduction

When programmers discover Python, they often fall in love. They find that they can accomplish more with less effort and less time. "How can I convince my boss that we should be using Python for our next big project", is a frequently asked question on the newsgroup comp.lang.python. Sadly, this article won't answer that question, but it will demonstrate using Python scripts for generating code in other languages. None of the projects I've worked on during the past year have been developed in Python. However, I've used Python scripts to generate code, both Java and Visual Basic, for many of them. This article walks the reader through versions of these scripts to illustrate how Python can be used to automate the generation of simple and repetitive code.

What Code Are You Generating?

In one standard approach to developing data-centric applications, one uses classes to represent data entities and a relational database to persist data. For example, in a sales contact system, you might have a Contact class, with attributes for id, name, address, phone number, e-mail, etc... which are stored in a database table named Contacts. Each Contact would be stored in a separate row in the Contacts table, and there would be a corresponding column for each attribute.

On the whole, the look and feel of data object classes is very similar, regardless of the language. Class attributes, or properties, correspond to table columns. The main difference is how attributes are accessed and modified in different languages. For example, Python development practices do not mandate that access to class attributes should be via “get” or “set” methods, but both Java and Visual Basic encourage this practice. As as result, data classes written in languages such as Java and Visual Basic typically contain a lot of repetitive code for accessing class attributes. In many cases this code is so repetitive, it's easier to write a script to write it for you. Here's where Python fits in. With excellent text manipulation capabilities, triple-quoted strings for templates, high-level datatypes, and concise syntax, Python is an ideal candidate for the task. In this article, we'll examine two code generation scripts. One generates Java classes by reading field names and types from a database table, and the other generates Visual Basic code from a text file describing both the table and the class.

Generating Java Beans

In the first example, we'll generate a Java Bean by reading column names and type from an SQL Server database, using adodbapi

The example scripts accompanying this article are modified versions of scripts I use on a regular basis, with some extended option handling and similar features removed. The spirit is the same, the code's just a little shorter.

What is a Java Bean?

First, some terminology. A Java Bean is Java class containing attributes which follow a naming convention. Usually, a Bean doesn't actually do anything. Bean properties, or attributes, are accessed via getters and setters or, if you're feeling highfalutin, accessors and mutators. Here's an example Java Bean class:

public class Point{

        public Point(){
                this.x = 0;
                this.y = 0;
        }

        public Point(int x, int y){
                this.x = x;
                this.y = y;
        }

        public int getX(){
                return this.x;
        }

        public void setX(int x){
                this.x = x;
        }

        public int getY(){
                return this.y;
        }

        public void setY(int y){
                this.y = y;
        }

        private int x;
        private int y;
}

For comparison, the Python equivlent might be:

class Point:
    def __init__(self, x=0 ,y=0):
        self.x = x
        self.y = y

The Java class contains a bunch of repetitive code the generation of which should be amenable to automatization. This is our first example.

The Steps

Before we actually look at the first script, a quick plan of attack. Here are the steps we'll take to generate Java Beans corresponding to database tables.

  • Connect to database and get a list of column names and types.
  • For each column:
    • Determine the appropriate Java type
    • Declare a private member variable to hold the column value
    • Create a pair of accessor methods.
  • Insert code into template for class
  • Write to a file.
Step 0

Before we actually do anything, we define the code templates used later on in the script. Triple-quoted strings are great for embedding code templates directly into a script, as strings can span multiple lines, are easily readable, and can contain variable placeholders. Variable declaration:

DECLARATION = """\
        private %(type)s %(attribute)s;"""
Getter and Setter pair:
GETTER_AND_SETTER = """\
        public %(type)s get%(name)s(){
                return %(attribute)s;
        }

        public void set%(name)s(%(type)s %(attribute)s){
                this.%(attribute)s = %(attribute)s;
        }"""
Class template:
CLASS = """\
public class %(class)s{

%(accessors)s

%(declarations)s

}"""

About now, you might be wondering about the indentation of DECLARATION and GETTER_AND_SETTER. They're indented to the desired level for the generated code.

Step 1

Retrieving column names and types from the database is going to be specific to each DBMS. I've done most of this work on Windows 2000, and have used both ADO, via win32com, and adodbapi. If you're not using an SQL Server on Windows, this code would have to be adapted as the Python DB-API does not specify how this information should be made available.

# See http://www.connectionstrings.com/
# for a good reference on connection strings.
CONN_STR = """\
Driver={SQL Server};\
Server=%(server)s;\
Database=%(database)s;\
Uid=%(username)s;\
Pwd=%(password)s"""

def get_cols(server,database,username,password,table):
        conn = adodbapi.connect(CONN_STR % locals())
        cursor = conn.cursor()
        # returns a table with detailed column information
        cursor.execute("sp_columns @table_name='%(table)s'" % locals())
        cols = []
        for row in cursor.fetchall():
                colname,coltype = row[3],row[5].upper()
                cols.append((colname,coltype))
        return cols
Step 2

Step 2 is the heart of the script. This is where we generate all the code we didn't want to write by hand.

For each attribute, we'll need to determine the appropriate variable type from the column type. This is easily done using a dict:

TYPES = {
        'CHAR':     'String',
        'DATETIME': 'java.sql.Timestamp',
        'INT':      'int',
        'NCHAR':    'String',
        'NVARCHAR': 'String',
        'VARCHAR':  'String',
        # and any others...
}

For each column, we'll determine the type and the variable name:

type = TYPES[coltype]
attribute = col2attr(colname)

With type and attribute names we can create declarations by replacing placeholders in the templates defined earlier with values for each attribute:

declarations.append(DECLARATION % {
        'type': type,
        'attribute': attribute})
accessors.append(GETTER_AND_SETTER % {
        'type': type,
        'name': attribute[0].upper() + attribute[1:],
        'attribute': attribute})
Step 3

Now that we have 2 lists which contain all variable declarations and accessor methods, we'll generate the class by inserting the declarations, accessors, and class name into the class template:

klass = CLASS % {
        'accessors':    '\n'.join(accessors),
        'declarations': '\n'.join(declarations),
        'class':        name
}

# this script uses spaces, expand to tabs
klass = klass.replace(' ' * 4, '\t')
Step 4

Finally, we write the generated code to a file:

f = open(class_name + '.java', 'w')
f.write(klass)
f.close()
Screenshot 1: Customer

You can view the script in it's entirety here.

Generating Vlsual Basic Classes

In the second script, we generate Visual Basic code, but use a different approach for obtaining column names and types. The approach taken in the first script works well if you've got column names which are well-named and easy to understand. However, on older DMBS's, in which column name lengths were limited, this may not be the case. A little over a year ago, I began work on a workflow management system written in Visual Basic. We were replacing a green-screen application which ran on an AS/400 using DB2 as the database. There were oodles of tables with piles of fields, many of which looked like randomly generated sequences of characters. As is sometimes the case, the system wasn't terribly well documented and the best source of information was those who had worked on it or used it. For each table we needed to access or modify, I'd create a tab-delimited text file with the following 4 columns:

ColumnName

Name of column in DB table. Example: LTYPE

ColumnType

SQL data type of column. Example: CHAR

PropertyName

Name to use for VB class property. Example: LocationType

Description

Description of column. Example: 4=Middle School, H=High School, etc...

The first two columns were populated from the database table itself, and the last two by asking the project manager, the definitive source of information, what the column name was and what it was for, how it was used, etc... The result was a text file that looked like this:

#column   type     property       description
LID       VARCHAR  LocationId     Location Id (Primary Key)
LCTY      VARCHAR  County         County Name
LREGN     VARCHAR  RegionCode     Region Code
LTYPE     CHAR     LocationType   H=High School,4=Middle School, etc...
LSECTR    CHAR     Sector         JR=Junior, T=Technical, etc...

Armed with that file, I could use a script to generate a Visual Basic class, with meaningful names for the data attributes, a description for each, and a mapping of property names to column names.

Generating the Vlsual Basic Class

This example is slightly more involved than the previous one, as there's more boiler-plate Visual Basic code.

To generate the class, for each attribute we'll declare a private variable, property accessors, and code for getting and setting the property value from an ADODB Recordset.

Here's the class template:

CLASS = """\
%(header)s

%(variables)s

%(accessors)s

%(rsfuncs)s
"""

Here's what the template values represent:

header

Visual Basic class metadata. Includes class name and properties. See the source file for more details.

variables

Private member variables.

accessors

Getters and Setters for member variables. “Accessors” is used in lieu of "Getters and Setters" since it requires less typing but is just as descriptive.

rsfuncs

Two methods, one for reading the member variables from an ADODB Recordset and one for transfering member variable values to an ADODB Recordset.

In Visual Basic one usually names a variable with a 3 letter prefix indicating the type of the variable. Private member variables are often further prefixed with m_. Here's an example variable declaration:

Private m_strCounty As String

To determine the type to use for a variable we'll use a dict, just as in the first script:

TYPES = {
        'INT':          'Long',
        'CHAR':         'String',
        'DATETIME':     'Date',
        'MONEY':        'Currency',
        'VARCHAR':      'String',
}

Most 3-letter prefixes are the first 3 letters of the variable type, but that's not always the case. For the exceptions, we'll use a dict as well:

PREFIXES = {
        'Long':     'lng',
}

Here's an example property declaration:

' County Name
Public Property Get County() As String
        Count = m_strCounty
End Property

Public Property Let County(strCounty As String)
        m_strCounty = strCounty
End Property

and here's the example template:

VARIABLE = """\
Private m_%(prefix)s%(name)s As %(type)s"""

ACCESSOR = """\
' %(description)s
Public Property Get %(name)s() As %(type)s
Attribute %(name)s.VB_Description = "%(description)s"
        %(name)s = m_%(prefix)s%(name)s
End Property

Public Property Let %(name)s(%(prefix)s%(name)s As %(type)s)
        m_%(prefix)s%(name)s = %(prefix)s%(name)s
End Property
"""

Get name of file containing table and class information, and the name to use for the class:

try:
        table_file,class_name = sys.argv[1:]
except (IndexError,ValueError),e:
        usage()

For each line in the file (ignoring comment lines), create a variable declaration, property get and let methods, and code to read and write the property value to and from an ADODB.Recordset.:

accessors = []
from_rs = []
to_rs = []
variables = []

for line in file(table_file):
        # allow for comments in file
        if line.startswith('#'):
                continue

        # for tab separated fields for each column
        colname,coltype,propname,description = line[:-1].split('\t')

        type = TYPES[coltype]
        prefix = PREFIX.get(type,type[:3].lower())

        accessors.append(ACCESSOR % {
                'name': propname, 
                'type': type, 
                'prefix': prefix, 
                'description': description})
        variables.append(VARIABLE % {
                'prefix': prefix, 
                'name': propname, 
                'type': type})
        from_rs.append(COPY_FROM_FIELD % {
                'name': propname, 
                'colname': colname})
        to_rs.append(COPY_TO_FIELD % {
                'name': propname, 
                'colname': colname})

rsfuncs = RS_FUNCS % {
        'read_from_fields': '\n'.join(from_rs), 
        'put_into_fields': '\n'.join(to_rs)
}

props = {
        'header': HEADER % {'class': class_name},
        'variables': '\n'.join(variables),
        'accessors': '\n'.join(accessors),
        'rsfuncs': rsfuncs,
}

f = file(class_name + '.cls', 'w')
print >> f, CLASS % props
f.close()

Screenshot 2: Location

The templates for the recordset functions are not included here, as they don't demonstrate anything new, but the entire script can be viewed here.

Testing Scripts

It's important to test your generated code to ensure both that it works and that it's actually what you wanted. I once whipped out a script that generated multiple output files. One of these files was an XML file containing metadata for the generated Java code, and though it was well-formed an attribute name had an extra "s" on it. I'd like to say I found the bug in no time, but it took me the better part of an hour to locate the problem. It was a perfect example of a case where it would have taken less time overall to have tested the script and verified it generated correct output than it did to find and fix the problem after the fact.

The easiest way I've found to test a script is to create a file containing your desired output. As you develop the script, run it and compare the output against the file you created. When they match, you're done.

Template Styles

Though Python's triple-quoted strings often work well for templates, “%(name)s” interpolation can be a little hard to read at times; furthermore, it's all too easy to forget the trailing s. As an alternative, one could easily write a class which could use shell-style $variable or ${variable} substitution. Here's how's the template for the Java Bean “get” and “set” methods would look:

GETTER_AND_SETTER = """\
        public $type get${name}(){
                return $attribute;
        }

        public void set${name}($type $attribute){
                this.$attribute = $attribute;
        }"""

Here's an example implementation, based on Barry Warsaw's reference implementation for PEP 292:

import sys
import re

class Template(str):
        def sub(self, mapping=None):
                # Default mapping is locals/globals of caller
                if mapping is None:
                        frame = sys._getframe(1)
                        mapping = frame.f_globals.copy()
                        mapping.update(frame.f_locals)
                def repl(m):
                        return mapping[m.group(m.lastindex)]
                return re.sub(r'\$(?:([_A-z]\w*)|\{([_A-z]\w*)\})', repl, self)

Example usage:

>>> t = Template("""\
...     private $type $name;""")
>>> print t.sub({'type': 'String', 'name': 'name'})
                private String name;
>>>

With this class, using shell-style substitution would be easy to do, and is left as an exercise to the reader.

Conclusion

Python is an ideal language for code generation and using scripts such as those presented in this article can be a big time saver. They're easy to write, read, and test and you can add functionality or features as requirements change. Another benefit of using Python, which might not initially be apparent, is that you aren't tied to any specific IDE or platform.

Code generation does have it drawbacks, however. It's so easy to do, one should be aware of what code actually needs to be repeated and what could be refactored into common functions or classes. You should also know what it is you want to generate before you begin.

Lastly, have fun -- and enjoy coding in Python so Python can do a little of your coding for you!

For further Reference:


http://www.connectionstrings.com/


http://adodbapi.sf.net/


PEP 292


Ollie Rutherfurd

shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT