PyZine
 


Article Finder
People
Issue 1 - Revision 1  /   Published in 2002 


 
  Py Links:
Latest Issue
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 02
Issue 01
 
 
Downloads
     
  Articles:
Throughout the quarter we cover topics of interest to Python developers.

  Scientific Python: Introducing Numeric

  Simple CGI Template Processing

  Extending Python with C: Part 1

  Image Viewing with TKinter

  Threading and the Global Interpreter Lock

 
 
 
     

Illustration by Lia Avant
Py Archive Article
Threading the Global Interpreter Lock

Threading the Global Interpreter Lock

- - - - - - - - - - - -

By Aahz | Originally published in Py Issue 1

print

Python is one of the few programming languages with thread support built in. Unlike some languages (such as Ruby), Python relies on the operating system (OS) to provide thread support. This means that you can't use Python threads on an OS that doesn't have threads (but see below for information on Microthreads). On the other hand, Python threads can produce a substantial performance boost.

The first key to making good use of Python threads is to gain an understanding the Global Interpreter Lock (GIL). The second key is learning how to use thread primitives for critical section locks and synchronization. This article doesn't cover them, but you can find information in my slideshow available on my website (see Resources), along with some source code examples of threaded programs.)

Before going further, this article assumes that you have at least a general idea of how processes and threads work, and that you know Python fairly well.

____
 
 
Sidebar - What is Global?

Someone familiar with threads in another language might wonder why Python would need to do a lock on every single variable access. After all, local variables don't need a lock. The problem is that there really is no such thing as a local variable in Python. Every single piece of data in Python is an object, and every single Python object is global.

Look at Example #1. You'll notice that after running f(), the global variable x now points at the dict created in the local scope f().

This is why many experienced Python programmers prefer to use "name" and "binding" instead of "variable" and "reference".

Example #1
x = 1
def f():
    global x
    y = {}
    y[3] = 5.2
    x = y
f()
 
____

All threads within a single process share memory; this includes Python internal structures (such as the reference count for each variable).

Something needs to prevent the threads from stomping all over the internals when two or more threads try to access the same memory at the same time. Imagine the problem that would result with the following sequence:

  • Thread one decreases the reference count
  • Thread two increases the reference count
  • Thread one deletes the object (thinks ref count is zero)
  • Thread two blows up when attempts to access freed memory

Python takes the easy way out: instead of locking access to individual data structures, it only allows one thread to run. The Global Interpreter Lock is the mechanism that permits only one thread to run Python code at any point in time, no matter how many CPUs you have. That is, the currently running thread blocks all other Python threads from accessing the interpreter. This means that you have to sidestep the GIL in order to make effective use of threads on multi-CPU machines. At the same time, the GIL simplifies your Python code and makes Python more efficient on single-CPU machines – there are always tradeoffs. (These advantages are not the reason for the GIL; the GIL exists primarily to make the core Python interpreter simpler.)

Note carefully that the GIL only restricts pure Python code. Extensions (external Python libraries usually written in C) can be written that release the lock, which then allows the Python interpreter to run separately from the extension until the extension reacquires the lock. All of the standard Python blocking I/O calls are written this way, so the simplest way to get efficiency gains from threading is to use threads to do lots of I/O.

The GIL in Action

Let's take a look at a couple of examples to illustrate what happens with the Global Interpreter Lock:

Example 2: Pure Python code

In Example #2, we see the same code in two different forms. Because this is pure Python code (with no calls to an extension that releases the GIL), only one thread can run. 2a shows the code executing linearly in a single thread. 2b shows the code split between two different threads. Even on a machine with two CPUs, 2a will execute faster, because the GIL prevents the two threads in 2b from executing simultaneously. If nothing else, the overhead of context switching between threads will make 2b run slower.

Example 3: Pure Python code

In Example #3, we're using the urllib module, which uses blocking sockets under the covers to make HTTP connections. The socket module releases the GIL, so both threads in example 3b can run simultaneously. Therefore, the single-threaded version (3a) is going to run slower in this case.

How Does the GIL Work?

First, a quick digression about Python internals: when you run a Python program, the source code gets compiled to bytecode (essentially a cross-platform assembly language – Java works much the same way). The Python virtual machine then executes the bytecode, stepping through the codes one at a time. A function call to an external routine counts as a single bytecode.

In many ways, the Global Interpreter Lock makes Python function in a manner similar to cooperative multitasking built on top of a preemptive multitasking environment. The interpreter core keeps track of the number of bytecodes that it has executed. Every ten instructions (this default can be changed), the core releases the GIL for the current thread. At that point, the OS chooses a thread from all the threads competing for the lock (possibly choosing the same thread that just released the GIL – you don't have any control over which thread gets chosen); that thread acquires the GIL and then runs for another ten bytecodes.

This means that if a single bytecode takes a minute to process (for example, the multiplication of 3L**100000 * 7L**1000000), no other Python threads will run for that minute.

Extensions Releasing the GIL

How can you tell whether an extension releases the GIL? The Python header files define a pair of C macros: Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS. If you look in the source code of the extension for these macros and find them, that extension releases the GIL and therefore should give you a speed boost on multi-CPU machines (and most likely a speed boost on single-CPU machines if the extension is I/O bound). (It's possible to release the GIL by calling primitives in the Python API directly, but the vast majority uses the macros.)

Explaining how to write thread-aware extensions is beyond the scope of this article, but you can find more information in the standard Python docs (see Resources).

One point does need to be made: when an extension releases the GIL, it is guaranteeing that it will only be accessing private memory until it reacquires the GIL. That is, it will not access any Python variables or functions; if it shares memory with any other extension threads, it will use its own locks instead of Python locks. When the thread finishes its work, it must reacquire the GIL before accessing Python again.

Working Around the GIL

As you've seen earlier in this chapter, calling a Python function that does I/O releases the GIL, which allows multiple threads to run simultaneously. There are a lot of tricks to making this efficient and effective; the code examples on my Starship page show several of them.

In this section, I'll briefly cover some other ways of changing or reducing the effect of the GIL on your application. Most of these ways amount to running multiple processes in addition to or instead of running multiple threads.

The least useful technique is to use sys.setcheckinterval()to change the number of bytecodes before a context switch. While this allows you to change the balance between responsiveness and efficiency (reducing the number of bytecodes increases responsiveness; increasing the number of bytecodes raises efficiency), the amount of time required to process each bytecode is too variable and unpredictable for this to be reliable.

Much more effective is to run Python in optimized mode (use python -O or set the PYTHONOPTIMIZE environment variable) to reduce the number of bytecodes – this will frequently get a speedup of more than 10% on its own in addition to reducing the number of context switches.

If you want to release the GIL in the current thread, calling time.sleep() with a non-zero value (usually time.sleep(0.001)) will force the release of the GIL. This isn't particularly useful, either; using the synchronization techniques shown on my Starship pages is usually more efficient.

Because the GIL applies only to a single process, running multiple processes guarantees that you'll sidestep the GIL. What you lose is the simple data-passing available with threading. Most operating systems provide a mechanism for sharing memory between processes, but the method is not portable, changing for each OS. This nullifies Python's cross-platform capabilities.

The standard way to communicate between processes is to use sockets, which also allows processes to communicate between machines. You can use bare sockets directly, but it's probably a better use of your time to choose one of the many protocols layered on top of sockets. Here's a partial list of popular protocols in the Python world:

  • CORBA (fnorb and omniORB)
  • Pyro
  • SOAP
  • XML-RPC
Microthreads

Standard Python threads use OS-level threads, which makes them easy to implement, and allows for certain kinds of efficiencies when you can sidestep the Global Interpreter Lock. However, most operating systems bog down with thousands of threads (which is likely to be a problem if you're using threads to run simulations), and not all OSes support threads.

An alternative to Python threads is the Microthreads package built on top of Stackless Python. Microthreads cures the two problems listed in the previous paragraph, but it comes with a couple of disadvantages.

Stackless Python is well tested and stable, but it is not part of the standard Python distribution. It requires a special build of the Python interpreter, meaning that you'll probably need to maintain multiple Python versions (binaries are available for Windows). A corollary is that Stackless Python usually lags behind the current Python release. Christian Tismer (author of Stackless) announced in Jan 2002 that he was going to do a complete rewrite of Stackless, so it's unclear when it will be available for Python 2.2.

The GIL cannot be released, even in extensions. It is possible to mix Python threads and microthreads, though it's a bit tricky.

For further Reference:

PythonCraft http://www.pythoncraft.com
Information and source code on threaded applications.

Python API Reference Manual http://www.python.org/doc/current/api/threads.html

Python Microthreads http://willware.net:8080/uthread.html

Stackless http://www.stackless.com/

This particular article is Copyright © 2002 Aahz. All Rights Reserved.
Aahz

is a writer and consultant specializing in Python. Aahz taught classes on Python threads at IPC9 and OSCON 2001, and will teach Python for [Perl] Programmers at OSCON 2002


shim
shim

 Py is committed to bringing you great Python Articles.

shim
shim


Home   Subscribe   Migration FAQ   Contact PyZine   Write for PyZine   ZopeMag   opensourcexperts.com  

Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine Zope/Plone hosting by Nidelven IT