|
|
||||||||||||||||||
|
|
||||||||||||||||||
![]() |
![]() |
Issue 8 - Revision 4 / October 20, 2005
|
|||
|
Extending Popular Software Applications with Python - - - - - - - - - - - - By Chris Downey | (None) The power and versatility of Python can be used to make popular, shrink-wrapped applications work together in new and exciting ways. For example, I've been using Intuit's Quicken and Microsoft's Excel for more than ten years, but the two applications have never worked cooperatively the way that I'd like. In particular, Quicken has good canned reports, but I often need the Quicken data rearranged and analyzed in ways best handled by a spreadsheet. Manually extracting the data from Quicken and importing it into Excel is tedious. However, with a little bit of Python, XML, Microsoft's Component Object Model (COM), and the named cells feature of Excel, I can effortlessly map and transfer Quicken data to my portfolio analysis spreadsheet. In this article, I'll show you the "glue" that connects Quicken to Excel. It's useful software, but moreover, it demonstrates how a surprisingly small amount of Python code can extend the inherent abilities of popular software applications . The sample code shown here is based on Python 2.3, the Python Win32 extensions, PyXML-0.8.3, and Microsoft Excel 2000. (Later versions of the d software should work just fine, too.)
Parsing the Quicken Data
The only reliable way to get data out of Quicken is to save a report as a plain text file. A report text file looks (something) like this: Security Shares Quote/Price est Gain/Loss Balance Berkshire-Hathaway 9.000 2,965.000 4,890.00 26,685.00 CASH TRANSACTIONS 8,484.220 .000 * -8,484.22 0.00 PETROCHINA CO LTD SPONS ADR 168.000 54.030 -157.18 9,077.04 TRowe Emerging markets bonds 7.722 13.250 2.10 102.32 TRowe Intl bond 59.683 10.630 21.68* 634.43 TRP STABLE VALUE FUND SCH E 4,109.020 1.000 * 0.00 4,109.02 Vanguard Aggressive 380.695 14.660 * 394.19 5,580.99 Vanguard Treasury Fund 18.370 1.000 0.00 18.37 VG Emerging Markets Index 1,100.792 14.520 8,256.70 15,983.50 Values in the report are separated by tabs. So, given a nice tabular format, how do you extract a specific datum? For example, assuming you've just purchased more shares of TRowe Price's International Bond fund (which is shown in the report as "TRowe Intl bond"), recorded the transaction in Quicken, and have produced a new report, how do you extract the new number of shares owned? While the new number of shares (59.683) is found in the second column (the "Shares" column) of the fifth row in this report, you cannot predict its position in future reports. The goal then is to programatically extract the number of TRowe Intl bond shares and then plug that number into the correct cell in a spreadsheet — without any hard-coding! If you index the report by using security names when searching for the correct row and security attributes when searching for the correct columns, and if you give each spreadsheet cell a unique name, such as "RPIBX_shares," you have a mapping that can be expressed in XML: <security quickenName="TRowe Intl bond" symbol="RPIBX"> <column name="Shares" excelName="RPIBX_shares"/> </security> The mapping states: find the TRowe Intl bond row, move to the Shares column, extract that value, and place it into the spreadsheet cell named RPIBX_shares. The next step is to create a Quicken report, or QReport, text parser class, that given the security quickenName and column name (see XML above), it returns the desired value. Python can handle this task with an astoundingly small amount of code.
01 class QReport:
02
03 def __init__(self):
04 file = open("report.txt","r",1) # 1 = line buffer
05 self.fLines = file.readlines()
06 file.close
07
08 def getReportItem(self,security,securityAttribute):
09 secAttrIdx = 0
10 for line in self.fLines:
11 idx = 0
12 tokens = split(line,"\t")
13 for token in tokens:
14 idx += 1
15 if token == securityAttribute:
16 secAttrIdx = idx
17 idx = 0
18 if idx == secAttrIdx:
19 if tokens[1] == security:
20 return token
21
22 if __name__ == "__main__":
23 qr = QReport()
24 print qr.getReportItem("TRowe Intl bond","Shares")
The loop that begins on line 10 proceeds line-by-line through the text report. Line 12 splits each line into a list of tokens, using tabs as separators. Line 13 begins inspecting each token. The first thing to look for is the specific report column header passed in via the securityAttribute parameter. As the main method requests, the code should looks for "Shares", which is the second column. Thus, secAttrIdx is set to 2. The code continues scanning line-by-line, and token-by-token. Each time the code captures a number of shares value, it checks (lines 18-19) for the correct security. When both the security and securityAttribute match, it returns the isolated value. If you download the source into I<c:q2x,> you can verify the results using Python's command line interpreter.
C:\q2x>c:\python\python
>>> import QReport
>>> qr = QReport.QReport()
>>> print qr.getReportItem("TRowe Intl bond","Shares")
59.683
>>> print qr.getReportItem("TRowe Intl bond","Quote/Price")
10.630
>>> print qr.getReportItem("TRowe Intl bond","Balance")
634.43
>>> print qr.getReportItem("VG Mid-Cap Index", "Shares")
1,055.740
Building the Map
The next step is to build the XML file that maps every investment in the Quicken report to a spreadsheet cell. As shown above in the Python interpreter, you can map any report value you wish: Shares, Quote/Price, or Balance. For instance, if you wanted to add a Quote/Price mapping to TRowe Intl bonds, it would look like this:
<security quickenName="TRowe Intl bond" symbol="RPIBX">
<column name="Shares" excelName="RPIBX_shares"/>
<column name="Quote/Price" excelName="RPIBX_price"/>
</security>
Again, the excelName attribute is the target cell in Excel. If you haven't used named cells in Excel before, open the sample spreadsheet, select a cell in the "Shares" column and look in the upper left hand corner above the "A" column. The name shown is the name of the cell. Named cells are a very powerful mechanism. Rather than refer to cells as O16 or P16, which is incredibly brittle, naming cells separates purpose from position. For the purposes here, named cells provides a powerful abstraction layer between the XML mapping and physical spreadsheet layout.
Putting It All Together
So, with a QReport class and a complete XML configuration file named quicken_import.xml that maps Quicken report data to spreadsheet cells, the next step is to extract the mappings from the XML and apply the gleaned values to the spreadsheet. Python is well-equipped to process XML. All of the heavy-lifting of parsing the XML has been done for us and packaged into XML modules such as xml.sax and xml.sax.handler. At this point, we can introduce the main q2x.py (Quicken to Excel) program that drives the entire process:
import xml.sax
import pvh
import QReport
qr = QReport.QReport()
parser = xml.sax.make_parser()
handler = pvh.PortfolioValueHandler(qr)
parser.setContentHandler(handler)
parser.parse("quicken_import.xml")
First, a QReport is instantiated, followed by an XML parser. The parser is a black box that generates SAX events, which are captured and processed with a custom handler, PortfolioValueHandler(). SAX events are triggered as XML elements and attributes are parsed. Notice that a reference to QReport is passed to the event handler. The reference enters the newly created object via the __init__() constructor and is stored in the instance variable self.qr. (Instance variables, referenced by self, exist for the lifetime of the instantiated object.) (You could create a QReport within the handler instead. However, a single "global" instance of QReport can be used by different objects that may require its services. A single QReport could keep track of report lines processed as a way to find omissions, for example). Here's the code for the SAX handler:
01 import xml.sax.handler
02 import excel
03
04 class PortfolioValueHandler(xml.sax.handler.ContentHandler):
05 def __init__(self,qrRef):
06 self.qr = qrRef
07 self.inTitle = 0
08 self.mapping = {}
09 self.excel = excel.excel()
10
11 def startElement(self, name, attributes):
12 if name == "security":
13 self.buffer = ""
14 self.quickenName = attributes["quickenName"]
15 self.symbol = attributes["symbol"]
16 if name == "column": # one security has X columns
17 self.colName = attributes["name"]
18 self.excelName = attributes["excelName"]
19 self.excel.updateNameValue(self.quickenName, self.excelName,
20 self.qr.getReportItem(self.quickenName,self.colName))
21 elif name == "title":
22 self.inTitle = 1
23
24 def endElement(self, name):
25 if name == "security":
26 self.inTitle = 0
27 self.mapping[self.symbol] = self.buffer
Due to the structure of the XML, with the column element embedded within the security element, startElement() is triggered at least twice for each of our securities.
The excel class is where the real layered magic happens: Python to win32com to COM to Excel. The excel class uses the win32com module to seamlessly leverage Microsoft's Component Object Model standard to provide the features needed to update our spreadsheet. COM provides a mechanism for objects to expose interfaces, which are really pointers to functions in a different process space. In fact, you can think of Microsoft Excel as a large set of COM objects for Python programs to consume.
from win32com.client import Dispatch
import sys
import pywintypes
from string import replace
01 class excel:
02 def __init__(self):
03 self.xlApp = Dispatch("Excel.Application")
04 self.xlApp.Visible = 1
05 self.xlApp.Workbooks.Open(Filename='C:\\q2x\\portfolio models2.xls')
06
07 def updateNameValue(self,quickenName,name,val):
08 try:
09 currentExcelVal = self.xlApp.Range(name)
10 currentExcelVal = str(currentExcelVal)
11 currentExcelVal = replace(currentExcelVal,',','')
12 val = replace(val,',','')
13 currentExcelVal = float(currentExcelVal)
14 val = float(val)
15 if cmp(currentExcelVal,val) <> 0:
16 print quickenName + " ("+name+") will be updated from " +
str(currentExcelVal) + " to " + str(val) + "\n"
17 self.xlApp.Range(name).Value = val
18 except pywintypes.com_error:
19 print "Spreadsheet is missing: "+quickenName+" (" + name + ")"
20 except ValueError:
21 print sys.exc_info()[1]
22 print "Value error generated for " + quickenName + " (" + name +
")" + " val=" + str(val)
23 except:
24 print sys.exc_info()[1]
25 print "Unexpected error generated for "+quickenName+" (" + name
+ ")" + " val=" + str(val)
The code is straightforward and easy to understand, but it's important to comprehend the power and complexity of the underlying Excel object model. Anything you can do manually within Excel can also be done programmatically. And anything that can be done programmatically within Excel can be done via COM outside of Excel via Python. This makes for an incredibly powerful toolset. Automating Excel can be intimidating at first due to its complex object model. According to the Excel object reference guide: The Range object is the object you will use most within your Excel applications. Before you can manipulate any region within Excel, you must express it as a Range object and work with methods and properties of that Range. A Range object represents a cell, a row, a column, a selection of cells containing one or more blocks of cells (which might or might not be contiguous), or even a group of cells on multiple sheets. Wow! That does not sound easy to work with. Yet, Python's flexible and rich data types make this much easier to handle than you might think. If q2x is currently processing TRowe International bonds, the parameters sent to updateNameValue() are:
Line 09 obtains the current value in the named cell. currentExcelVal = self.xlApp.Range(name) This line of Python code requests the named Range "RPIBX_shares" from the Excel application instance pointed to by the reference self.xlApp. Lets say currentExcelVal is set to 55.345. Lines 10-14 change currentExcelVal and val from strings to floats to do a strict numerical comparison. If the values are different (line 15), the code informs the user of the change (line 16) and plugs the new value back into the spreadsheet (line 17). That's all there is to it.
Summary
With a small amount of code, the process of updating a spreadsheet with Quicken transaction data has been automated. Python is a superb tool to tie together the data and functionality of separate programs that were not designed to work together. Python is designed to be extended in a modular fashion. It is easy to obtain access to important, complex technologies such as SAX and COM via the familiar, elegant Python idiom. Python is weakly-typed with a rich set of flexible data types, which makes it possible to largely ignore data type mapping issues, even when working with complex underlying technologies such as COM and Excel Range objects. Chris Downey is a senior software engineer, architect, and consultant with 15 years of experience. A self-taught programmer who started with mainframes, SQL, and OS/2 at IBM, Chris served as software architect at 2s2i where he used a mix of Java, Python, and SQL to create software that analyzed online discussion trends. An ex- PowerBuilder (PB) consultant with Wall Street credentials, Chris is currently responsible for keeping legacy PB apps in shape and building new functionality into complex J2EE systems. Chris is based in the metro DC area and can be reached via cdowney@gmail.com. Chris Downey |
|||||||||||||||||||||||||||||||||||||||||||||||
|
Py is committed to bringing you great Python Articles. | ||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine |
|