PyZine
 


Article Finder
People
Issue 5 - Revision 7  /   April 20, 2004 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  RSOAP

  Simple Code Generation

  4ss

  Pyro

  PyCon 2004

  Kaa & Firedrop

  XML-RPC for Python

  Applied XML-RPC

 
 
 
     

Illustration by Lia Avant
article
4ss

4ss
- Web Applications with XML
- - - - - - - - - - - -

By Uche Ogbuji  | February 3, 2003

print
Introduction

4Suite is a platform for XML and RDF applications. A part of 4Suite, the repository, is an XML and RDF database management system. RDF is the W3C's Resource Description Framework, an extensible system for managing metadata. Among the features the 4Suite repository offers are

  • Storage of XML documents and RDF statements
  • XSLT, RELAX NG, XUpdate, RDF extraction and synchronization, querying and other built-in processing
  • Access through multiple network protocols: HTTP, FTP, WebDAV, and a proprietary protocol, FtRPC
  • Command line tools, XSLT, and Python APIs
  • Back end storage on the file system, relational DBMS, or Metakit

4Suite is particularly suited for building web applications with XML technologies. It allows you to store, index, transform, and render XML documents. I mentioned the various APIs available for 4Suite. In this article I take a closer look at the Python API. 4Suite is implemented in Python and C, and most of the features of the repository are available to Python code. This is handy for scripted processing in 4Suite apps, integration into other Python tools, extension of 4Suite's capabilities, and even for rapid access to and maintenance of the repository using Python's interactive prompt.

Because 4Suite is written mostly in Python, you can pretty much customize it to your heart's content using Python code, as long as you have the right permissions to access the application source code. But in this article, I shall stick to the official Python API, also known as Client Core (CCore).

If you want to try the code examples yourself, you should have 4Suite installed and a repository instance initialized as described in the UNIX or Windows install guides. You should also have at least skimmed the repository quick start guide and set up a non-super user with a home folder as it recommends.

Repository Objects

All objects in 4Suite are available through CCore as Python proxy objects. These are arranged into a hierarchy with a proxy object representing the repository itself at the root, all top level folders (also known as containers) below the root, all subfolders below the top-level folders, and so on. This arrangement is similar to the way many file systems are organized. The first step in working with CCore is to get the repository proxy object, which you do by logging in. The following interactive Python session illustrates this:

>>> import sha
>>> from Ft.Server.Client import Core
>>> pw_hash = sha.new("uo").hexdigest()
>>> repo = Core.GetRepository("uo", pw_hash, "localhost", 8803)  

This code works for the case where username is "uo" and the password is "uo", accessing the local machine on the standard port. Adjust accordingly to fit your circumstances. Most of what you need to access the repository will be imported from the Ft.Server module. The GetRepository function allows you to give the authentication and network information to create a repository proxy object. You pass the user name and SHA password hash, which I compute using the standard sha module. You get back a repository proxy object.

There are a lot of methods you can invoke on the repo object, as you can see by running dir(repo). But you can conveniently access this object as a dictionary where the keys are the names of the child resources and the values are the proxy objects for each child resource. A resource is any object that is managed in the repository, including containers, XML files, raw (non-XML) files, and other things.

>>> repo.keys()
[u'web', u'ftss', u'home']  

This shows that I have two top-level resources in the repository. Let's examine the ftss resource a bit more closely:

>>> obj = repo['ftss']
>>> obj
<Ft.Server.Client.Core.ContainerClient.ContainerClient instance at 0x81c90c4>
>>> obj.keys()
[u'servers', u'docs', u'dashboard', u'commands', u'demos', u'data', u'docdefs',
u'groups', u'users'] 

The object's literal representation tells us that the ftss object is a container object or, more accurately, a container client proxy object. You can also go further and look at the contents of obj['data'], which is itself a container, and the contents of containers can be accessed using dictionary idiom as well. You will find many more resources, most of which are now actual files. The following code displays the contents of the XML resource identified by the repository path /ftss/data/null.

>>> obj = repo['ftss']['data']['null']
>>> obj.getContent()
'<null/>\n'  

The getContent() method retrieves the contents of the resource, which is XML in this case because the resource is an XML document. It can also be the contents of a raw file (anything from HTML to a JPEG to a ZIP file). All resources in the repository have a standard content view. If you invoke getContent() on a container, you'll see an XML-ized view of its entries.

Paths and updates

You needn't always use dictionary access to navigate the repository. It also supports navigating local paths in the resource hierarchy. The following code fetches the null resource in a way generally equivalent to repo['ftss']['data']['null']

>>> obj = repo.fetchResource('ftss/data/null')
>>> obj.getContent()
'<null/>\n'  

You can also invoke fetchResource() on other objects to navigate relative to those objects. Any path that starts with "/" is absolute and is effectively fetched relative to the repository itself. You can also get the absolute path of any resource.

>>> c1 = repo.fetchResource('ftss/data')
>>> c1.getAbsolutePath()
u'/ftss/data'
>>> c2 = c1.fetchResource('..')
>>> c2.getAbsolutePath()
u'/ftss'  

So far all these operations are read-only, but you can also update the repository. For example, I create a simple XML file in my home directory as follows:

>>> DOC = u"""<?xml version="1.0" encoding="UTF-8"?>
... <verse>
...   <attribution>Wole Soyinka</attribution>
...   <line>Traveller, you must set out</line>
...   <line>At dawn.  And wipe your feet upon</line>
...   <line>The dog-nose wetness of the earth</line>
... </verse>
... """
>>> home_folder = repo.fetchResource('home/uo')
>>> new_doc = home_folder.createDocument('dawn.xml', DOC, imt='text/xml') 

The createDocument() method on container objects creates an XML document by default. You can create specialized XML documents such as XSLT stylesheets optimized for transforms using additional options. The first parameter is the name of the document to be created; the second is the content of the document, a Python Unicode object. I explicitly specify the Internet Media Type (IMT) of the resource. The repository keeps careful track of the IMT of resources because, among other reasons, they are needed on the Web. The return value from createDocument() is a proxy object for the newly created resource.

You can perform all sorts of XML processing operations on the new document. The following example applies one of the XSLT stylesheets that comes with 4Suite.

>>> xslt_obj = repo.fetchResource('ftss/data/decorated-xml.xslt')
>>> transform_result = new_doc.applyXslt([xslt_obj])  

First I get a proxy object for the XSLT stylesheet I want to apply. Then I invoke the applyXslt() method on the source document. This method takes a list of stylesheets; even though I have only one, I put it into a list. The result is a tuple of which the first item is a string buffer with the transform output. The second item is the IMT of the result. I do not show the result because of its length, but do try it yourself and see. The transform result is a pretty HTML view of the XML document similar to the well-known Internet Explorer 5 view of an XML document.

To create a non-XML resource in the repository, you must use a special method since createDocument() tries to parse the given contents as XML. The following adds to the repository an image file from a remote web site.

>>> import urllib
>>> url = urllib.urlopen('http://4suite.org/include/4Suite-org.png')
>>> image_data = url.read()
>>> home_folder.createRawFile('4Suite-org.png', 'image/png', image_data)
<Ft.Server.Client.Core.RawFileClient.RawFileClient instance at 0x85bcbec>
 

createRawFile() takes the new resource name, an IMT, and the data for the resource, the PNG file body in this case. The creation methods are only available on container objects and the repository itself.

The repository is fully transactional. If I were to end this Python session right now, all my changes would be lost. In order to save my changes, I have to commit the transaction:

>>> repo.txCommit()  

You can also use repo.txRollback() to discard the changes. Once you have ended a transaction in either way, you must not use the repo object again or you'll get an error.

Conclusion

I hope this walkthrough of the Python API to the 4Suite repository is enough to get you started. There are many other details and capabilites I did not cover. Many of them build on these basics in a straightforward way. As you can see, accessing the 4Suite repository from Python is very easy. 4Suite already provides the basic tools for building web applications with XML technologies. The Python API adds a rich dimension of additional capabilities.


Uche Ogbuji

shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT