PyZine
 


Article Finder
People
Issue 5 - Revision 7  /   April 20, 2004 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  RSOAP

  Simple Code Generation

  4ss

  Pyro

  PyCon 2004

  Kaa & Firedrop

  XML-RPC for Python

  Applied XML-RPC

 
 
 
     

Illustration by Lia Avant
article
RSOAP

RSOAP
- Using "R" with Python
- - - - - - - - - - - -

By Gregory R. Warnes | February 3, 2004

print
Abstract

RSOAP provides a SOAP interface for the Open-Source statistical package R. It permits software to take advantage of the advanced statistical analysis techniques provided by R – running either locally or on a remote machine – using the simple and widely accepted SOAP protocol.

Introduction

RSOAP permits software to take advantage of the advanced statistical analysis techniques provided by R using the simple and widely accepted SOAP protocol. In addition to the benefit of encapsulating R with a simple API, the R server can be located on a physically distinct machine. This can ease the task of resource allocation as well as reducing administrative and maintenance effort, since R need only be installed and updated on a single machine.

History

RSOAP was originally created as a part of an effort to integrate R into the Zope Web application development system. Zope is a multi-threaded application implemented in Python. To provide seamless integration with Zope, and to achieve acceptable performance, it is necessary to have multiple R sessions executing concurrently.

Two existing Python packages (Duncan Temple Lang’s RSPython and Walter Moriera’s RPy provided interfaces between Python and R. However, neither could provide multi-threaded access to R because R itself is neither multi-threaded nor thread-safe. For this reason, running multiple R sessions require multiple independent processes.

In order to minimize the complexity of interacting with these independent R processes, I developed RSOAP. RSOAP provides a simple API for managing and communicating with R processes using the SOAP communications protocol. I selected SOAP as the communications mechanism to avoid directly handling binary data while preserving the underlying data structures. I also hoped that by selecting SOAP, the resulting package would be easy to use with other applications and programming languages.

RSOAP is now successfully used by the RSessionDA Zope add-in, Spotfire’s DecisionSite Advantage for R, and a number of private applications both inside and outside of Pfizer.

Implementation

As Zope is programmed in Python, I decided to implement RSOAP in Python as well. The interface between Python and R is handled by RPy by Walter Moreira. The selection of SOAP was aided by the availability of several Python libraries for transparently handling the details of the SOAP protocol in Python. From the available options, I selected the SOAPpy library, by Cayce Ullman and Brian Matthews, for its ease of use.

There are three conceptual units of the RSOAP system: a server manager (implemented by the class RSOAPManager), which starts a new R session object on request; an R session object (implemented by the classes RSOAPManager and RProcess), which actually services the SOAP requests by passing SOAP calls to the R process, and returning the results; and the client. A simple example of a Python client is provided in Example.py.

The files RSOAPConnection.py and localRSOAPConnection.py, define Python classes which encapsulate the RSOAP connection so that it appears to the user to be a local Python object. The latter checks for and capitalizes on the common case where the client and server are on the same machine. In this case the localRSOAPClient class avoids the expensive process of translating into/from XML by using direct file operations instead of SOAP calls for file manipulation.

Security

RSOAP was designed to be used internally by the RSessionDA Zope package in a trusted intranet environment. In this environment RSOAP is never visible directly to end-users and is entirely contained within the corporate firewall. Consequently, the system was not designed with any true security provisions. However, some simple security methods have been implemented. The RSOAPManager is run as a non-privileged user, each R session is executed in a separate directory, and access to files outside of the directory where an individual session is running is discouraged by prohibiting the filenames passed to the uploadFile, downloadFile, and delFile calls from including path characters.

Performance

Since the primary purpose of the RSOAP package is to provide a back-end for interactive Web applications, some effort was spent on ensuring good performance. To this end two optimizations have been implemented. The first optimization is integrated into the server: the RSOAPManager pre-starts an R process when it starts up. When a request for a new session is received, this process is then duplicated using the Unix clone call. This avoids the overhead involved in starting up a new R process.

The second optimization is integrated into the localRSOAPConnection client class, which acts as a wrapper for RSOAP. As mentioned earlier, this class performs file operations by directly accessing the file system instead of calling the corresponding RSOAP methods when the client and the server are running on the same machine. This avoids the costly process of base64 encoding, XML packaging, XML un-packaging, and base64 decoding files and results in a significant time savings when manipulating large files.

Together, these two optimizations provide performance that is acceptable for our interactive Web applications.

API

RSOAP provides an API which has been designed to be simple and flexible while supporting all common operations. At this time, the RSOAPManager class provides a single API call, newServer(), which starts a new R Session. Once newServer has been called, the R Session object, implemented by the RProcess class, provides 4 categories of operations (methods): file access, code execution, variable manipulation, and session management.

These API methods provide all of the necessary functionality to interact with an R process. Except for two calls of the code, execution and variable manipulation methods directly accept and return native data types and objects. The eval and script methods are provided for handling R operations which are difficult or impossible to directly represent using the client’s native data types (such as model formulae) and to facilitate the execution of code blocks (a.k.a. ’scripts’). Our experience indicates that the set of defined methods allows rich interaction with the R process.

One area of common interest which is not directly represented by API calls is the creation of graphical objects. Because of the complexity of these objects, we have decided not to attempt to provide access to ’live’ graphs. Instead, the client application should open an image file device, call the graphics commands, close the image file device, and then download the created image for presentation to the user.

Example

Once RSOAP has been properly compiled and installed (see the README file), the RSOAPManager can be started with

python RSOAPManager.py [options]
 
Available options include:
<mgr_port>

TCP port where RSOAPManager should listen for new session requests. The default is 9081

<start_port>

Lowest TCP port on which an R Session should listen for commands. The default is <mgr_port>+1

<end_port>

Lowest TCP port on which an R Session should listen for commands. The default is 65535.

The sample client can then be started with

python Example.py

This client connects to the server at http: //hostname: 9081and then runs through a complete session. During the session, it performs some standard operations: generating random numbers, uploading a data file, fitting a regression model, generating a JPEG plot, assigning a value to an R object, retrieving the value of an R object, saving a session, and restoring a previously saved session. The files RSOAPConnection.py and localRSOAPConnection.py execute similar test code when invoked directly, and in addition allow specification of the hostname and port number on the command line:

python Example.py [<host> [<port>]]

For illustration, here is a simple Python example for the use of RSOAP from Python:

import SOAPpy
 
def unwrap(object ):
    "Unwrap SOAPpy objects to get 'raw' python objects"
        
    if isinstance( object, SOAPpy.SOAP.structType ):
        return object._asdict
    elif isinstance( object, SOAPpy.SOAP.arrayType ):
        return object.data
    else:
        return object
 
print "## Contact the RSOAPManager and ask for a new R Session\n"
mgr = SOAPpy.SOAPProxy("http://rstatserver.pfizer.com:9081")
server_url = mgr.newServer()
print "Result: %s " % server_url
print "\n"
 
print "## Connect to the new session\n"
server = SOAPpy.SOAPProxy( server_url )
print "Result: %s " % server
print "\n"
 
print "\n"
print "## Request 10 random values\n"
x= unwrap(server.call("rnorm",10))
print "Result: %s " % x
print "\n"
 
print "## Request 10 random values with given means\n"
y = unwrap(server.call("rnorm",10,x))
print "Result: %s " % y
print "\n"
 
print "## Fit a linear model\n"
server.putObject("x",x)
server.putObject("y",y)
lm = unwrap(server.script("reg <- lm( y ~ x ) \n summary(reg)"))
print "Result: %s" % lm
print "\n"
 
print "## Extract a the p-values\n"
p_value = unwrap(server.eval("coef(summary(reg))[,4]"))
print "Result: %s" % p_value
print "\n"
 
print "## Close session ###\n"
server.quit()
print "\n"
 

This code requests a new R session from the RSOAP server running on rstatserver.pfizer.com port 9081. It then generates two normal random vectors, x and y, and fits a linear model using the generated values. Here is the output generated by running this sample code (with some minor reformatting):

## Contact the RSOAPManager and ask for a new R Session
Result: http://gsun492:9111 
 
 
## Connect to the new session
Result:  
 
## Request 10 random values
Result: [0.53539445707, -0.40085934426999997, 0.19319099498, 
-0.23954321423899999, 1.31725950482, 0.53521712358600004, 
-0.115073438349, 0.14035568773599999, -1.1966058606300001, 
-1.2352003970400001] 
 
## Request 10 random values with given means
Result: [1.2013866076199999, -1.02216526005, 1.5269029731999999, 
0.13749410999600001, 0.93745162857300002, 0.33283158571400001, 
-0.92884702165900002, -0.32734088029699998, -2.1902689738399999, 
-0.65814896191700001] 
 
## Fit a linear model
Result: 
> reg <- lm(y ~ x)
 
> summary(reg)
 
Call:
lm(formula = y ~ x)
 
Residuals:
    Min      1Q  Median      3Q     Max 
-0.8073 -0.5172 -0.3273  0.6010  1.3583 
 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.04706    0.25135  -0.187   0.8561  
x            1.11643    0.33727   3.310   0.0107 *
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
 
Residual standard error: 0.7933 on 8 degrees of freedom
Multiple R-Squared: 0.578,      Adjusted R-squared: 0.5253 
F-statistic: 10.96 on 1 and 8 DF,  p-value: 0.01069 
 
## Extract a the p-values
Result: {u'x': 0.0106945547648, u'(Intercept)': 0.85614403999400002}
 
## Close session ###
 

A more complete example is available in the file Example.py and in the test code sections of RSOAPConnection.py and localRSOAPConnection.py, which are included in the source distribution.

Discussion

The RSOAP package provides a simple and stable SOAP interface for R. This has provided two major benefits. First, SOAP libraries exist for most languages, making it easy to write code which interacts with R. Locally, we have fully or partially implemented clients for Python, Zope, COM, Java, and Spotfire. We expect to implement a PERL client in the near future.

Second, since SOAP is a remote object protocol, the client and the server need not be running on the same machine. This permits us to deploy applications which run on the client’s machine, taking advantage of the potential for interactivity which this provides, while avoiding the need for R to be installed on multiple machines. This in turn makes it easy to ensure that the latest version of R is available, that the latest versions of custom R scripts are used, and that sufficient computational resources are deployed.

The current version of RSOAP has a number of limitations. It does not provide a mechanism for detecting or removing ”orphaned” R processes: processes which have been started by a client but were not terminated when no longer needed. So far I have attempted to minimize this problem by appropriate safeguards in the client code, but a future version of RSOAP should provide a mechanism for killing off R sessions which have not been accessed for a certain period of time. For this purpose I have considered adding a separate control port. This would provide a natural way to control the RSOAPManager in other ways, such as requesting that the RSOAPManager shut itself down .

Security is another area for future work. For the standard Internet security issues such as user validation and data encryption, relatively straightforward solutions already exist (e.g. SSL). A potentially more difficult problem, however, is providing a ”safe” version of R. While the current measures—running R as a non-privileged user and using separate session directories—provide some protection, which could be augmented by running R in a ”chroot jail”, the user must also be prevented from performing unsafe operations, such as opening sockets. In addition, denial of service attacks would also have to be prevented, which would require some method of restricting resource usage. In all, this is a daunting task.

Conclusion

RSOAP provides a simple, reliable mechanism for interacting with R from a variety of languages. It has already proven useful for a number of projects here at Pfizer, and I anticipate that it will be useful to others.

For further Reference:

(Download)
RSOAP API: (PDF)

RSOAP 1.0.0 Source Code (zip, tgz)

RSOAP Home Page


Gregory R. Warnes

is a research statistician for Pfizer Global Research and Development and a Research Affiliate for the Yale Department of Computer Science. Click here to visit his homepage.


shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT