|
Illustration by Lia Avant
|
 |
| 4ss |
4ss
- Web Applications with XML
- - - - - - - - - - - -
By Uche Ogbuji
| February 3, 2003
Introduction
4Suite is a platform for XML and RDF applications. A part of 4Suite,
the repository, is an XML and RDF database management system. RDF is
the W3C's Resource Description Framework, an extensible system for
managing metadata. Among the features the 4Suite repository offers are
- Storage of XML documents and RDF statements
- XSLT, RELAX NG, XUpdate, RDF extraction and synchronization, querying and other built-in processing
- Access through multiple network protocols: HTTP, FTP, WebDAV, and a proprietary protocol, FtRPC
- Command line tools, XSLT, and Python APIs
- Back end storage on the file system, relational DBMS, or Metakit
4Suite is particularly suited for building web applications with XML
technologies. It allows you to store, index, transform, and render XML
documents. I mentioned the various APIs available for 4Suite. In this
article I take a closer look at the Python API. 4Suite is implemented
in Python and C, and most of the features of the repository are
available to Python code. This is handy for scripted processing in
4Suite apps, integration into other Python tools, extension of 4Suite's
capabilities, and even for rapid access to and maintenance of the
repository using Python's interactive prompt.
Because 4Suite is written mostly in Python, you can pretty much
customize it to your heart's content using Python code, as long as you
have the right permissions to access the application source code. But
in this article, I shall stick to the official Python API, also known as
Client Core (CCore).
If you want to try the code examples yourself, you should have 4Suite
installed and a repository instance initialized as described in the UNIX
or Windows install guides. You should also have at least skimmed the
repository quick start guide and set up a non-super user with a home
folder as it recommends.
Repository Objects
All objects in 4Suite are available through CCore as
Python proxy objects. These are arranged into a hierarchy with a proxy
object representing the repository itself at the root, all top level
folders (also known as containers) below the root, all subfolders below
the top-level folders, and so on. This arrangement is similar to the
way many file systems are organized. The first step in working with
CCore is to get the repository proxy object, which you do
by logging in. The following interactive Python session illustrates
this:
>>> import sha
>>> from Ft.Server.Client import Core
>>> pw_hash = sha.new("uo").hexdigest()
>>> repo = Core.GetRepository("uo", pw_hash, "localhost", 8803)
This code works for the case where username is "uo" and the password
is "uo", accessing the local machine on the standard port. Adjust
accordingly to fit your circumstances. Most of what you need to access
the repository will be imported from the Ft.Server module.
The GetRepository function allows you to give the
authentication and network information to create a repository proxy
object. You pass the user name and SHA password hash, which I compute
using the standard sha module. You get back a repository
proxy object.
There are a lot of methods you can invoke on the repo
object, as you can see by running dir(repo). But you can
conveniently access this object as a dictionary where the keys are the
names of the child resources and the values are the proxy objects for
each child resource. A resource is any object that is managed in the
repository, including containers, XML files, raw (non-XML) files, and
other things.
>>> repo.keys()
[u'web', u'ftss', u'home']
This shows that I have two top-level resources in the repository.
Let's examine the ftss resource a bit more closely:
>>> obj = repo['ftss']
>>> obj
<Ft.Server.Client.Core.ContainerClient.ContainerClient instance at 0x81c90c4>
>>> obj.keys()
[u'servers', u'docs', u'dashboard', u'commands', u'demos', u'data', u'docdefs',
u'groups', u'users']
The object's literal representation tells us that the
ftss object is a container object or, more accurately, a
container client proxy object. You can also go further and look at the
contents of
obj['data'], which is itself a container, and the
contents of containers can be accessed using dictionary idiom as well.
You will find many more resources, most of which are now actual files.
The following code displays the contents of the XML resource
identified by the repository path /ftss/data/null.
>>> obj = repo['ftss']['data']['null']
>>> obj.getContent()
'<null/>\n'
The getContent() method retrieves the contents of the
resource, which is XML in this case because the resource is an XML
document. It can also be the contents of a raw file (anything from HTML
to a JPEG to a ZIP file). All resources in the repository have a
standard content view. If you invoke getContent() on a
container, you'll see an XML-ized view of its entries.
Paths and updates
You needn't always use dictionary access to navigate the repository.
It also supports navigating local paths in the resource hierarchy. The
following code fetches the null resource in a way generally
equivalent to repo['ftss']['data']['null']
>>> obj = repo.fetchResource('ftss/data/null')
>>> obj.getContent()
'<null/>\n'
You can also invoke fetchResource() on other objects to
navigate relative to those objects. Any path that starts with "/" is
absolute and is effectively fetched relative to the repository itself.
You can also get the absolute path of any resource.
>>> c1 = repo.fetchResource('ftss/data')
>>> c1.getAbsolutePath()
u'/ftss/data'
>>> c2 = c1.fetchResource('..')
>>> c2.getAbsolutePath()
u'/ftss'
So far all these operations are read-only, but you can also update
the repository. For example, I create a simple XML file in my home
directory as follows:
>>> DOC = u"""<?xml version="1.0" encoding="UTF-8"?>
... <verse>
... <attribution>Wole Soyinka</attribution>
... <line>Traveller, you must set out</line>
... <line>At dawn. And wipe your feet upon</line>
... <line>The dog-nose wetness of the earth</line>
... </verse>
... """
>>> home_folder = repo.fetchResource('home/uo')
>>> new_doc = home_folder.createDocument('dawn.xml', DOC, imt='text/xml')
The createDocument() method on container objects creates
an XML document by default. You can create specialized XML documents
such as XSLT stylesheets optimized for transforms using additional
options. The first parameter is the name of the document to be created;
the second is the content of the document, a Python Unicode object. I
explicitly specify the Internet Media Type (IMT) of the resource. The
repository keeps careful track of the IMT of resources because, among
other reasons, they are needed on the Web. The return value from
createDocument() is a proxy object for the newly created
resource.
You can perform all sorts of XML processing operations on the new
document. The following example applies one of the XSLT stylesheets
that comes with 4Suite.
>>> xslt_obj = repo.fetchResource('ftss/data/decorated-xml.xslt')
>>> transform_result = new_doc.applyXslt([xslt_obj])
First I get a proxy object for the XSLT stylesheet I want to apply.
Then I invoke the applyXslt() method on the source
document. This method takes a list of stylesheets; even though I have
only one, I put it into a list. The result is a tuple of which the
first item is a string buffer with the transform output. The second
item is the IMT of the result. I do not show the result because of its
length, but do try it yourself and see. The transform result is a
pretty HTML view of the XML document similar to the well-known Internet
Explorer 5 view of an XML document.
To create a non-XML resource in the repository, you must use a
special method since createDocument() tries to parse the
given contents as XML. The following adds to the repository an image
file from a remote web site.
>>> import urllib
>>> url = urllib.urlopen('http://4suite.org/include/4Suite-org.png')
>>> image_data = url.read()
>>> home_folder.createRawFile('4Suite-org.png', 'image/png', image_data)
<Ft.Server.Client.Core.RawFileClient.RawFileClient instance at 0x85bcbec>
createRawFile() takes the new resource name, an IMT, and
the data for the resource, the PNG file body in this case. The
creation methods are only available on container objects and the
repository itself.
The repository is fully transactional. If I were to end this Python
session right now, all my changes would be lost. In order to save my
changes, I have to commit the transaction:
>>> repo.txCommit()
You can also use repo.txRollback() to discard the
changes. Once you have ended a transaction in either way, you must not
use the repo object again or you'll get an error.
Conclusion
I hope this walkthrough of the Python API to the 4Suite repository is
enough to get you started. There are many other details and capabilites
I did not cover. Many of them build on these basics in a
straightforward way. As you can see, accessing the 4Suite repository
from Python is very easy. 4Suite already provides the basic tools for
building web applications with XML technologies. The Python API adds a
rich dimension of additional capabilities.
|