PyZine
 


Article Finder
People
Issue 1 - Revision 1  /   Published in 2002 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  Scientific Python: Introducing Numeric

  Simple CGI Template Processing

  Extending Python with C: Part 1

  Image Viewing with TKinter

  Threading and the Global Interpreter Lock

 
 
 
     

Illustration by Lia Avant
Py Archive Article
Scientific Python

Scientific Python
- Py in Science
- - - - - - - - - - - -

By Eric Jones | Originally published in Py Issue 1

print

The Numeric module adds multi-dimensional array types to Python and forms the heart of its scientific computing capabilities. Numeric is the platform upon which higher-level libraries such as Scientific and SciPy are built. Its usefulness extends beyond the scientific community. Anyone crunching numbers with Python can benefit from its simplicity and speed.

Numeric was originally developed by Jim Hugunin, also of Jython fame and is currently maintained by Paul Dubois of Lawrence Livermore National Labs along with a loose team of developers. Numeric is hosted on Sourceforge and distributed in multiple formats including source tar balls, click-install MS Windows installers, and RPM’s for both Python 2.1 and Python 2.2.

The fundamental idea behind Numeric is that of array arithmetic. To illustrate, consider two lists of numbers, a and b. In standard Python, the individual elements in the lists might be added together using the following loop:

>>> a = [1,2,3]
>>> b = [3,4,5]
>>> c = [0,0,0]
>>> for i in range(len(a)):
...    c[i] = a[i] + b[i]
>>> print c
[4, 6, 8]

Using Numeric arrays, the same operation reduces to adding a and b together.

>>> from Numeric import *
>>> a = array([1,2,3])
>>> b = array([3,4,5])
>>> c = a + b
>>> print c
[4 6 8]

Other standard numeric operators are supported, as well, and are always done on an element-by-element basis. Numeric supplies functions for operations such as matrix-vector and matrix-matrix multiplies.

The benefits of array operations are more than aesthetic. There is also a significant speed benefit. In the first example, the looping is done in Python code and, even though all the elements of a and b are integers, the interpreter doesn’t have this information and spends time discovering element types to issue the proper addition operation. In contrast, the loop executed under the covers of c = a + b is done in C and has the benefit of knowing that all the operands are integers. In most cases, the speed improvement is one or two orders of magnitude. For this simple example, my windows laptop provides a factor of 50 improvement for 10000 element sequences.

Note that using Numeric arrays as a straight replacements for lists in the first example will not produce any speed improvements. The entire overhead is in the interpreter, not the data structure. You must use array arithmetic to see speed improvements.

Numeric also provides array equivalents for most of the functions in the standard math library so that expressions with trigonometric functions (as well as others) are possible:

>>> from Numeric import *
>>> a = array([0,pi/4,pi/2])
>>> c = sin(a)**2 + cos(a)**2
>>> print c
array([ 1.,  1.,  1.])


Multi-dimensional Arrays

The array constructor function takes a sequence as its first argument. If it is a sequence of numbers, as in the previous examples, a one-dimensional array is created. If it is a sequence of sequences, a multi-dimensional array is created. Here, a two dimensional array is created:

>>> from Numeric import *
>>> a = array( [[ 0,  1,  2],
...		[10, 20, 30]])
>>> a
array([[ 0,  1,  2], [10, 20, 30]])

Array arithmetic works identically on multi-dimensional arrays as it does on one-dimensional arrays.

Numeric Types in Arrays

Usually all the values of an array have the same numerical type. During array creation, the list of numbers is scanned to determine the appropriate numerical type, and the typecode for the array is set:

>>> from Numeric import *
>>> # Integer (Long) Arrays
>>> a = array([1,2,3])
>>> print a, '--> typecode:', a.typecode()
[1 2 3] --> typecode: l

>>> #Floating point (double precision) array
>>> a = array([1,2,3.0])
>>> print a, '--> typecode:', a.typecode()
[ 1.  2.  3.] --> typecode: d
>>> # Complex ( double precision) array
>>> a = array([1,2,3+2j])
>>> print a, '--> typecode:', a.typecode()
[ 1.+0.j  2.+0.j  3.+2.j] --> typecode: D

By default, Numeric uses the double precision versions of floating point and complex numbers because these are the types that Python itself supports. However, many other numeric types are also supported. To specify explicitly that an array is single precision, the array constructor uses keyword arguments:

>>> from Numeric import *
>>> a = array([1,2,3],typecode=Float32)
>>> print a, '--> typecode:', a.typecode()
[ 1.  2.  3.] --> typecode: f

The available types are as follows:

Figure 1: Diagrams and Reports

Note that there are more types than a machine can actually handle. For example, it is impossible to represent a floating point number with 0 bits which is exactly what Float0 asks Numeric to do. Float0 is still a useful identifier to have, however, because it tells Numeric to use the smallest floating point representation available on the host machine for the given array.

Indexing and Slicing Arrays

Array indexing follows the conventions used in other Python sequences, allowing both positive and negative values.

>>> from Numeric import *
>>> a = array([1,2,3])
>>> a[0]
1
>>> a[-1]
3

Array slicing is also similar to normal sequence slicing:

>>> from Numeric import *
>>> a = array([1,2,3,4,5,6])
>>> a[2:-2]
array([3, 4])
>>> a[:3]
array([1, 2, 3])
>>> a[3:]
array([4, 5, 6])

Slicing syntax for arrays also has some additions. Numeric array slicing has a third field to specify the “stride” between elements in a slice. The following extracts the even elements in an array:

>>> from Numeric import *
>>> a = array([0,1,2,3,4,5])
>>> a[::2]
array([0, 2, 4])

Here the ‘start’ and ‘stop’ fields have been omitted and default to the beginning and end of the array respectively. Multi-dimensional arrays are indexed by specifying multiple slice arguments separated by commas:

>>> from Numeric import *
>>> a = array([[ 0, 1, 2, 3, 4],
...            [10,11,12,13,14],
...            [20,21,22,23,24]])
>>> a[:2,1:3]
array([[ 1,  2],
       [11, 12]])

The first slice argument specifies the rows and the second specifies the columns of the resulting slice. Array slices also vary from other sequence slices in an important way. Slicing standard sequences always produces a copy of the original slice. Changes to the newly created sequence do not affect values in the original sequence. In contrast, array slices are references or views into the original array. Slicing a Numeric array and assigning it to a variable will result in two variables referencing the same memory. The following example illustrates this. The first set of commands demonstrates slicing a list:

 >>> a = [0,1,2,3,4,5]
>>> b = a[1:3]
>>> b
[1, 2]

# Now assign a new value to an element in b.
>>> b[0] = 10
>>> b
[10, 2]

# The original list, a, remains unchanged.
>>> a
[0, 1, 2, 3, 4, 5]

Now try the same commands using Numeric arrays:

>>> a = array([0,1,2,3,4,5])
>>> b = a[1:3]
>>> b
array([1, 2])

# Now assign a new value to an element in b.
>>> b[0] = 10
>>> b
array([10,  2])

# The original array has also changed!
>>> a
array([ 0, 10,  2,  3,  4,  5])

Numeric’s “slices as references” behavior is sometimes questioned, and it does require some programming care to prevent unexpectedly overwriting data. But the feature is also very important for efficient numeric computations. Arrays often contain many millions of elements. Making a copy every time a slice is made would be computationally expensive and could lead to “out of memory” errors in cases where arrays fill most of the available memory. Consider a simple finite difference calculation on a two dimensional grid:

>>> b = (a[1:,1:-1] – a[:-1:1:-1]) / dx

If slices created copies, this calculation would require approximately three times the memory used by the original a matrix.

In cases where a copy is preferable, the copy method will do the trick:

>>> a = array([0,1,2,3,4,5])
>>> b = a[1:3].copy()
>>> b[0] = 10
>>> a
array([0, 1, 2, 3, 4, 5])

It’s worth noting that arrays can have typecode = PyObject and hold arbitrary collections of Python objects. The augmented slicing or referencing behavior can be beneficial outside of Numeric computations. Also, if the objects in the arrays have Numeric operators defined for them, array arithmetic will produce expected results:

>>> a = 
… array(('abc',[1,2,3,4]),typecode=PyObject)
>>> a
array([abc , [1, 2, 3, 4] ],'O')
>>> a+a
array([abcabc , [1, 2, 3, 4, 1, 2, 3, 4] ],'O')
Conclusion

This short introduction has covered array construction, typecodes, indexing, and slicing of Numeric arrays. Numeric has a number of other features, methods, and modules for working with arrays. In the next column, I will discuss array broadcasting and several of the methods for selecting array elements based on some criteria or condition. Future columns will discuss optimization and efficiency issues as well as tools for accelerating computations such as the weave module.

For further Reference:

Numeric http://sourceforge.net/projects/numpy/
The home page has links to the download site as well as more comprehensive documentation of Numeric’s capabilities.

SciPy http://www.scipy.org
A large package of scientific/engineering libraries for Python.

Scientific http://starship.python.net/crew/hinsen/scientific.html
Scientific is another package of scientific libraries with many features targeted at computational chemistry.

Weave http://www.scipy.org/site_content/weave
A package for inlining C/C++ within Python. It has support for compiling Numeric array expressions for fast execution.

This particular article is Copyright © 2002 Eric Jones. All Rights Reserved.
Eric Jones

is the President and CTO of Enthought, Inc. in Austin, Texas. Focused on scientific and business computing, Enthought leads the development of SciPy, an Open Source library of scientific tools for Python. Eric’s research interests include computational electromagnetics as well as genetic/evolutionary optimization techniques. He received his Ph.D. in Electrical Engineering from Duke University and his BS in Mechanical Engineering from Baylor University. Eric has taught classes for a number of conferences and companies and is available for both one and two day workshops covering Python for Scientific Computing.


shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT