Text Processing in Python

David Mertz

Addison-Wesley 2003
A book review by Danny Yee © 2006 https://dannyreviews.com/
Text Processing in Python is not for the casual scripter who wants solutions to immediate problems. It has plenty of concrete examples, but it's not a cookbook; it describes the contents of standard modules and libraries, but it's not a reference. Rather, it uses examples and explanations to explore the fundamental ideas behind, and features of, both text processing and Python.

Misleadingly titled "Python Basics", the opening chapter is likely to scare off many readers. Mertz begins with an introduction to functional programming in Python, followed by a tutorial on polymorphism and class construction. He then covers the mechanics of actually running Python and some of the standard modules for filesystem access and interfaces with operating systems.

The core of Text Processing in Python is in three chapters, on string handling, regular expressions, and parsing. "Basic String Operations" works through some examples of common tasks: sorting, reformatting, counting, encoding binary data as ascii, and more. It then goes through the contents of the string module and modules for memory-mapped files (mmap) and StringIO, binary/ascii conversions, cryptography, compression, and unicode handling.

A brief regular expression tutorial is followed by a look at some common tasks, which are used to illustrate progressively more sophisticated regular expressions. This is followed by detailed exposition of the standard re module.

Mertz warns that parsing is often overkill and suggests that other options be tried first. He then explains EBNF grammars and state machines, before working through the mx and PLY libraries and assorted other tools. Readers with no previous exposure to language theory may find this difficult.

A final chapter looks at tools for email, web and other protocols for passing text around the Internet. Appendix A provides "a selective and impressionistic short review of Python" — enough for an experienced programmer without previous acquaintance with Python — and other appendices provide background on compression and unicode.

Text Processing in Python offers a nice combination of foundational material and practical applications. Its approach means there is little overlap with other Python books: even when going through standard libraries, Mertz largely avoids repeating generic material, and there's none of the padding that's used to flesh out many computing books. The approach will appeal most to those with a computer science background, or an inclination that way.

Perhaps most importantly, Text Processing in Python is, at least if you have the right background, a good read. I found it entertaining as well as informative, refreshing some basic computer science as well as discovering new Python details and approaches; it has inspired me to rework the scripts used to format these reviews for the web.

Note: Text Processing in Python is available in full on Mertz's web site, along with its code examples.

June 2006

External links:
- buy from Amazon.com or Amazon.co.uk
- information from David Mertz
Related reviews:
- books about computing
- books published by Addison-Wesley
%T Text Processing in Python
%A Mertz, David
%I Addison-Wesley
%D 2003
%O paperback, index
%G ISBN 0321112547
%P 520pp