|
|
||||||||||||||||||
|
|
||||||||||||||||||
![]() |
![]() |
Issue 6 - Revision 7 / October 18, 2004
|
|||
|
optparse command-line parsing library for Python - - - - - - - - - - - - By Michele Simionato | July 10, 2004 "The optparse module is a powerful, flexible, extensible, easy-to-use command-line parsing library for Python. Using
optparse, you can add intelligent, sophisticated handling of command-line options to your scripts with very little overhead." Once upon a time, when graphic interfaces were still to be dreamed about, command-line tools were the body and the soul of all programming tools. Many years have passed since then, but some things have not changed: command-line tools are still fast, efficient, portable, easy to use and - more importantly - reliable. You can count on them. You can expect command-line scripts to work in any situation, during the installation phase, in a situation of disaster recovery, when your window manager breaks down and even in systems with severe memory/hardware constraints. When you really need them, command-line tools are always there. It's important for a programming language - especially one that wants to be called a "scripting" language - to provide facilities to help the programmer in the task of writing command-line tools. For a long time Python support for this kind of tasks has been provided by the``getopt`` module. I have never been particularly fond of ``getopt``, since it required a sensible amount of coding even for the parsing of simple command-lines. However, with the coming of Python 2.3 the situation has changed: thanks to the great job of Greg Ward (the author of ``optparse`` a.k.a. ``Optik``) now the Python programmer has at her disposal (in the standard library and not as an add-on module) a fully fledged object-oriented API for command-line arguments parsing, which makes writing Unix-style command-line tools easy, efficient and fast. The only disadvantage of ``optparse`` is that it is sophisticated tool, which requires some time to be fully mastered. The purpose of this paper is to help the reader to rapidly get the 10% of the features of ``optparse`` that she will use in the 90% of the cases. Taking as an example a real life application - a search and replace tool - I will guide the reader through (some of) the wonders of ``optparse`` and will share some tricks that will make your life with ``optparse`` much easier. This paper is intended for both Unix and Windows programmers, and it does not require any particular expertise to be fully appreciated. A simple exampleI will take as an example a little tool I wrote some time ago, a multiple files search and replace tool. I needed it because I am not always working under Unix, and I do not always have sed/awk or even Emacs installed, so it made sense to have this Python script in my toolbox. It is only few lines long, it can always be modified and extended with a minimal effort, works on every platform (including my PDA) and has the advantage of being completely command-line driven. It doesn't require that any graphics library be installed and I can use it when I work on a remote machine via ssh. The tool takes several files and replaces a given regular expression everywhere in-place; moreover, it saves a backup copy of the original unmodified files and provides the option to recover them when I want to. Of course, all of this can be done more efficiently in the Unix world with specialized tools, but those tools are written in C and they are not as easily customizable as a Python script. Therefor It makes sense to write this kind of utility in Python. Finally, let me note that I find ``optparse`` to be much more useful in the Windows world than in the Unix/Linux/OS X world. The reason is that the plethora of pretty good command-line tools which are available under Unix are missing in the Windows environment and since Python is spreading to the embedded market too, including PDA's, cellular phones, are missing there too. Using Python and ``optparse``, you may write your own scripts once and having them to run on every platform running Python, The Unix philosophy for command-line argumentsIn order to understand how ``optparse`` works, it is essential to understand the Unix philosophy about command-lines arguments. As Greg Ward puts it: "The purpose of optparse is to make it very easy to provide the most standard, obvious, straightforward, and user-friendly user interface for Unix command-line programs. The optparse philosophy is heavily influenced by the Unix and GNU toolkits ..." Here is a brief summary of the terminology: the arguments given to a command-line script - *i.e.* the arguments that Python stores in the list ``sys.argv[1:]`` - are classified in three groups: options, option arguments and positional arguments. Options can be distinguished since they are prefixed by a dash or a double dash; options can have arguments or not (there is at most an option argument right after each option); options without arguments are called flags. Positional arguments are what it is left in the command-line after you remove options and option arguments. In the example of the search/replace tool, I will need two options with an argument - since I want to pass to the script a regular expression and a replacement string - and I will need a flag specifying whether or not a backup of the original files needs to be performed. Finally, I will need a number of positional arguments to store the names of the files on which the search and replace will act. Consider the following situations: you have several text files in the current directory containing dates in the European format DD-MM-YYYY, and you want to convert them to the American format MM-DD-YYYY. If you are sure that all your dates are in the correct format, your can match them with a simple regular expression such as ``(\d\d)-(\d\d)-(\d\d\d\d)``. In this example it is not important to make a backup copy of the original files, because you can revert to the original format by running the script again. So the syntax to use would be something like:
$> replace.py --nobackup --regx="(\d\d)-(\d\d)-(\d\d\d\d)" \
--repl="\2-\1-\3" *.txt
In order to emphasize the portability, I have used a generic ``$>`` promtp, meaning that these examples work equally well on both Unix and Windows (of course on Unix I could do the same job with sed or awk, but these tools are not as flexible as a Python script). The syntax here has the advantage of being quite clear, but the disadvantage of being quite verbose, and it is handier to use abbreviations for the name of the options. For instance, sensible abbreviations can be ``-x`` for ``--regx``, ``-r`` for ``--repl`` and ``-n`` for ``--nobackup``; moreover, the ``=`` sign can safely be removed. Then the previous command reads
$> replace.py -n -x"(\dd)-(\dd)-(\d\d\d\d)" -r"\2-\1-\3" *.txt
You see here the Unix convention at work: one-letter options (a.k.a. short options) are prefixed with a single dash, whereas long options are prefixed with a double dash. The advantage of the convention is that short options can be composed: for instance
$> replace.py -nx "(\dd)-(\dd)-(\d\d\d\d)" -r "\2-\1-\3" *.txt
means the same as the previous line, i.e. ``-nx`` is parsed as ``-n -x``. You can also freely exchange the order of the options, for instance in this way:
$> replace.py -nr "\2-\1-\3" *.txt -x "(\dd)-(\dd)-(\d\d\d\d)"
This command will be parsed exactly as before, i.e. options and option arguments are not positional. How does it work in practice?Having stated the requirements, we may start implementing our search and replace tool. The first step, and the most important one, is to write down the documentation string:
#!/usr/bin/env python
The next step is to write down a simple search and replace routine:
import re
This replace routine is entirely unsurprising, the only thing you may notice is the usage of the "U" option in the line
txt=file(fname,"U").read()
This is a new feature of Python 2.3. Text files open with the "U" option are read in "Universal" mode: this means that Python takes care of the newline pain, i.e. this script will work correctly everywhere, irregardless of the newline conventions of your operating system. The script works by reading the whole file in memory: this is bad practice, and here I am assuming that you will use this script only on short files that will fit in memory, otherwise you should "massage" the code a bit. Also, a full fledged script would check if the file exists and can be read, and would do something in the case it is not. How does it work? It is quite simple, really. First you need to instantiate an argument line parser from the ``OptionParser`` class provided by ``optparse``:
import optparse
The string ``"usage: %prog files [options]"`` will be used to print a customized usage message, where ``%prog`` will be replaced by the name of the script (in this case `replace.py``). You may safely omit it and ``optparse`` will use a default ``"usage: %prog [options]"`` string. Then, you tell the parser informations about which options it must recognize:
parser.add_option("-x", "--regx",
The ``help`` keyword argument is intended to document the intent of the given option; it is also used by ``optparse`` in the usage message. The ``action=store_true`` keyword argument is used to distinguish flags from options with arguments, it tells ``optparse`` to set the flag ``nobackup`` to ``True`` if ``-n`` or ``--nobackup`` is given in the command line. Finally, you tell the parse to do its job and to parse the command line:
option, files = parser.parse_args()
The ``.parse_args()`` method returns two values: ``option``, which is an instance of the ``optparse.Option`` class, and ``files``, which is a list of positional arguments. The ``option`` object has attributes - called *destionations* in ``optparse`` terminology - corresponding to the given options. In our example, ``option`` will have the attributes ``option.regx``, ``option.repl`` and ``option.nobackup``. If no options are passed to the command line, all these attributes are initialized to ``None``, otherwise they are initialized to the argument option. In particular flag options are initialized to ``True`` if they are given, to ``None`` otherwise. So, in our example ``option.nobackup`` is ``True`` if the flag ``-n`` or ``--nobackup`` is given. The list ``files`` contains the files passed to the command line (assuming you passed the names of accessible text files in your system). The main logic can be as simple as the following:
if not files:
A nice feature of ``optparse`` is that an help option is automatically created, so ``replace.py -h`` (or ``replace.py --help``) will work as you may expect:
$> replace.py --help
You may programmatically print the usage message by invoking ``parser.print_help()``. At this point you may test your script and see that it works as advertised. How to reduce verbosity and make your life with ``optparse`` happierThe approach we followed in the previous example has a disadvantage: it involves a certain amount of redundance. Suppose, for example, that we want to add the ability to restore the original file from the backup copy. Then, we have to change the script in three points: in the docstring, in the ``add_option`` list, and in the ``if .. elif .. else ...`` statement. At least one of these is redundant. The redundance can be removed by parsing the docstring to infer the options to be recognized. This avoids the boring task of writing by hand the ``parser.add_option`` lines. I implemented this idea in a cookbook recipe, by writing an ``optionparse`` module which is just a thin wrapper around ``optparse``. For sake of space, I cannot repeat it here, but you can find the code and a small explanation in the Python Cookbook (see the reference below). It is really easy to use. For instance, the paper you are reading now has been written by using ``optionparse``: I used it to write a simple wrapper to docutils - the standard Python tool which converts (restructured) text files to HTML pages. It is also nice to notice that internally docutils itself uses ``optparse`` to do its job, so actually this paper has been composed by using ``optparse`` twice! Finally, you should keep in mind that this article only scratch the surface of ``optparse``, which is quite sophisticated. For instance you can specify default values, different destinations, a ``store_false`` action and much more, even if often you don't need all this power. Still, it is handy to have the power at your disposal when you need it. The serious user of ``optparse`` is strongly encorauged to read the documentation in the standard library, which is pretty good and detailed. I think that this article has fullfilled its function of "appetizer" to ``optparse``, if it has stimulate the reader to learn more. References:optparse/optik is is a sourceforge project on its own: -starting from Python 2.3, ``optparse`` is included in the standard library: I wrote a Python Cookbook recipe about optionparse.
|
|||||||||||||||||||||||||||||||||||||||
|
Py is committed to bringing you great Python Articles. | ||||||||||||||||||||||||||||||||||||||||
![]() |
Reproduction of material from any of PyZine's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 PyZine |
|