PyZine
 


Article Finder
People
Issue 4 - Revision 2  /   January 28, 2003 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
Migration FAQ
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  What's new in Python 2.3?

  Chaco Properties

  Software Holy Wars

  Creating Crossword Puzzles with Python



 
 
Downloads
     
  URLs
Things of Interest to Python Users

  opensourcexperts
  ZopeMag
  Daily Python
 
     

Illustration by Mark Pratt
article
What's new in Python 2.3

What's new in Python 2.3
- Streaming Media with Zope
- - - - - - - - - - - -

By Andrew M. Kuchling | published November 17, 2003

print
Abstract

Every Python release has had a different complexion, ranging from radical to conservative. For example, Python 2.0 was a radical release: it added Unicode, string methods, cyclic garbage collection, and new syntax (print » f(*args, **keyworddict)). Python 2.1 was middle-of-the-road: it had one radical change, the introduction of nested scopes, and a number of less noticeable changes and new modules. Python 2.2 was another radical release, adding new-style classes alongside the existing object model and new language features, most notably generators and iterators.

Python 2.3 should be released some time during the summer of 2003; the second alpha is already available. So what's Python 2.3 like? It leans more toward the conservative side, emphasizing useful new modules more than changes to the Python language. There certainly are some changes to the language itself, but there aren't many of them, and none of them are major.

Generators

Generators are now fully part of Python. First introduced in 2.2, a from __future__ import generators directive was required to enable the yield keyword needed to use them. In 2.3 yield is always a keyword; no __future__ directive is required.

In Python, C, and most other languages, a stack frame is created to hold the local variables whenever a function is entered. This stack frame is destroyed when the function is exited, whether it's by hitting a return statement or falling off the end of the function. Generators change this model by not destroying the local stack frame, instead keeping it around so it can be reused. Consider the simplest generator function:

    def g():
        for i in range(4):
            yield i+1

In Python, C, and most other languages, a stack frame is created to hold the local variables whenever a function is entered. This stack frame is destroyed when the function is exited, whether it's by hitting a return statement or falling off the end of the function. Generators change this model by not destroying the local stack frame, instead keeping it around so it can be reused. Consider the simplest generator function:

Calling g() creates an instance of the generator and returns it. The instance has its own stack frame for local variables, so it has its own private value for the variable i. The returned generator behaves like an iterator.

Let's detour momentarily for a quick refresher on iterators, another feature introduced in Python 2.2. To be an iterator over some sequence, an object simply needs a next() method that returns the next item in the sequence and raises the StopIteration exception when there are no more items left. As of Python 2.2 the for statement always use an iterator; default iterators are used for lists and strings: existing code -- for example, for i in [1,2,3] -- continued to work.

When the generator's next() method is called, execution of the body of code begins and continues until it hits a yield statement. yield i + 1 evaluates the expression i + 1 and the resulting value is returned by next(). The subsequent call to next() picks up after the yield, so the for loop goes around again and this time i + 1 evaluates to 2. And so it goes, until the loop is over and the subsequent next() call falls off the bottom of the generator, causing StopIteration to be raised.

Generators are most useful when you'll be iterating over a very large collection of elements, a collection so large that you don't want it to be created as an actual list in memory. For example, if you wanted to loop over all of the files on a disk, the resulting list might well be too large to fit in memory. A generator can return filenames one-by-one, reducing the memory required for such a traversal.

Basically, a streaming server works like an ordinary file server, delivering a requested file to the requesting user(s). But where an ordinary file- or Web-server - delivers the complete file to a user on request, a streaming server delivers the file in small packages, ordered according to the file's implicit timeline. The streaming server continuously "chucks up" the file and keeps track of where the user is in the file, assuring that the media content is played back in the right order by the end-user's client and with the appropriate timeline.

Booleans

The most visible change to Python 2.3 is probably the addition of booleans. The Boolean is the only new builtin type, though a number of other data types -- including date/time types, sets, and heaps -- were added to the standard library.

The Python 2.2.1 bugfix release prepared the way for the Boolean type by adding True and False as builtin names. In 2.2.1 they're just integers, as if you'd assigned:

    True  = 1
    False = 0

In 2.2.1, if you print the value of True, you just get 1.

Python 2.3 keeps the names True and False, but they're now unique instances of a new Boolean type, instances which still behave like integers in most cases. For example, True * 42 is legal and is the same as 1 * 42, and 42 - False is the same as 42 - 0. The only real difference is their printed values: if you apply str() and repr() to True and False, you get the strings 'True' and 'False' instead of '1' and '0'.

Most of Python 2.3's operators, builtin functions, and library modules have been changed to return True or False where appropriate; for example, the in operator and the isinstance() builtins now return True and False.

A Builtin Sequence Enumerator

A new builtin function, enumerate(), was added to simplify the common Python idiom for looping over a list and doing something to each entry. The idiomatic code usually looked something like.

    for i in range(len(L)):
        item = L[i]
        ... 
        L[i] = item

You could also use a list comprehension or a map() call, but the idiomatic for loop was often easier to read, especially when its body was longer than a line or two. The range(len(L)) is the most inelegant part of the idiom; sometimes I typed range(L), which is incorrect.

enumerate() makes this idiom a bit tidier. enumerate(iterator) returns an iterator, which produces the sequence [(0, iterator[0]), (1, iterator[1]), ..., (N, iterator[N])]. The idiomatic loop can now be rewritten as

    for i, item in enumerate(L):
        ...
        L[i] = item
New Modules

The bulk of the additions in Python 2.3 are handy new modules for one task or another. The most generally useful new module may be the logging package, a set of flexible and highly customizable classes for recording log messages from a program's various subsystems. For the simplest uses, you can just import the logging module and call the right function:

    import logging
    logging.debug("Starting program")
    logging.warn("Config file %s not found", "/etc/application.conf")
    logging.critical("Disk full")

More complex software may be divided into subsystems. For example, a program for data analysis might have a user interface subsystem that displays a GUI, a network subsystem that handles retrieving data from remote servers, and a computational component that does some work with the data. Each subsystem can have its own log, a separation which lets you look at debugging messages for the computational subsystem without drowning in messages from the network component. To implement this you just have to retrieve a particular log with the getLogger() function:

    net_log = logging.getLogger("network") # subsystem name
    net_log.debug("Starting DNS lookup")
    comp_log = logging.getLogger("compute")
    comp_log.error("Matrix is not diagonalizable")

There are also hooks for implementing custom handlers and log records. With a bit of work you can build a logging scheme closely tailored to your application and your debugging needs.

Dates and Times

Several types were added to represent times, all contained in the datetime module. There's a date class, instances of which have year, month, day attributes, a time class with hour, minute, second, and microsecond attributes, and a datetime stamp that has all of them. These classes are an upgrade from the functionality of the 9-tuples used by the time module, but they a step below mxDateTime, the most common extension used for date handling. datetime's types are easier to work with than 9-tuples, but mx.DateTime also has functions for parsing strings in various date and time formats, as well as support for dates in the distant past or future. If you're already using mxDateTime, there's not much reason to switch to the new 2.3 types.

A new sets module contains two data types for representing mathematical sets, that is, unordered collections of elements with no duplicates. It's always been possible to use dictionaries to get the semantics of a set, but doing so meant you had to implement the intersection and union operations yourself, a simple but annoying task. sets contains two set classes: Sets which can have elements added and removed at any time, and ImmutableSets, which can't be modified, thus allowing the creation of sets of sets.

Using the set classes is straightforward. The constructors can take any Python sequence to populate the set, and then you can perform intersection and union operations on sets.

    >>> import sets
    >>> s = sets.Set('abc')
    >>> s
    Set(['a', 'c', 'b'])
    >>> s2 = sets.Set(['c', 'd', 'e'])
    >>> s.intersection(s2)
    Set(['c'])

Instances of the mutable Set class can also be updated in place:

    >>> s.union_update(s2)
    >>> s
    Set(['a', 'c', 'b', 'e', 'd'])

Having sets in the Python standard library isn't a huge leap forward, but it is a pleasant convenience.

bsddb

The code for the old bsddb module has been replaced by the third-party PyBSDDB package. A compatibility interface is provided so that programs using the bsddb will continue to work; the new interfaces provide access to the transactional features in current versions of BerkeleyDB. This is probably my favorite enhancement because all of the database modules previously included with Python were rather feeble. Thus, Python 2.3 will be a significant leap forward, making it possible to write fancier, more robust data-handling applications with an out-of-the-box installation of Python.

Miscellany

Other notable, new modules include:

  1. heapq implements a priority-sorted heap queue. Heap queues behave like lists but maintain their elements in sorted order. The operations of removing the smallest element and adding a new element both take O(log n) time, so heap queues are often used to implement schedulers.
  2. itertools contains a number of helpful functions for use with iterators, inspired by various functions provided by the ML and Haskell languages. For example, itertools.ifilter(predicate, iterator) returns all elements in the iterator for which predicate() returns True, and itertools.times(N, obj) returns 0 N times.
  3. optparse is a fancy new parser for command-line options that will error-check options, convert arguments to integers or booleans, and automatically produce a help message.
  4. tarfile lets you read and write archives in tar format.
  5. textwrap will word-wrap lines of text.

One significant module has been removed. rexec is still present but now always raises an error on being imported. It's been unceremoniously disabled because it has been essentially unmaintained for the last few versions. Occasionally bugs would be found and someone would fix them, but no one has been carefully updating the module in light of recent changes to Python, so no one is quite sure if it's still safe. For example, Python 2.2 changed the object model in many ways -- for example, it made types callable -- and possibly introduced new ways to break out of restricted execution. The safest and most honest course was to remove rexec.

Other New Things

Most of the other new features are less significant, though some of them are awfully cute:

  1. A set of hooks was added to implement new import machinery. For example, Python 2.3 comes with a module that implements importing from ZIP-format archives; append a ZIP archive's filename to sys.path and future imports will look inside the archive for .py files.
  2. Python has supported an extended slice syntax for a long time. Most slices are written as [start:end], but it's also possible to specify a stride -- [start:end:n] -- to take every 1#1 element. However, only Numeric Python's array type did anything with the stride value. Python's builtin sequences have never supported it. In 2.3, if L is a list containing the numbers from 0 to 10, L[::3] will select every third element, resulting in [0,3,6,9]. Even negative values work, starting at the end and striding backward: L[::-3] produces [10, 7, 4, 1].
  3. When applied to strings, in now accepts multicharacter strings; 'ab' in 'abcd' now returns True instead of raising a TypeError.
  4. A few more steps have been taken along the path of integer and long integer unification. Most notably, int() will now return a long integer instead of raising OverflowError when it's passed a very long number. Large hexadecimal constants such as 0x80000000 currently result in negative 32-bit numbers, but they'll produce positive long integers in Python 2.4; hence, 2.3 makes them trigger a FutureWarning message.

There are other language-level changes, most of them so esoteric that the majority of Python programmers won't notice them.

End of Article
Andrew M. Kuchling

shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT