Red Hat Lab - Introduction to Python

Martin Sivák <msivak@redhat.com>
with contributions by:
Daniel Mach <dmach@redhat.com>
Jozef Skladanka <jskladan@redhat.com>

Useful links

Interesting links:

Don't forget

Python features

Writing python

VIM

ts=4
sts=4
sw=4
expandtabs

Also take a look at the project Jedi to get autocompletion and jumps

Emacs

More complicated to setup, but can use Jedi (through auto-complete module) as well.

(defun ms-python-hook ()
  (setq python-indent-offset 4)
  (setq python-smart-indentation nil)
  (setq indent-tabs-mode nil)
)

(add-hook 'python-mode-hook 'ms-python-hook)

Eclipse

If you are used to Eclipse then there is a great PyDev for Eclipse plugin.

Python source file structure

#!/usr/bin/env python
# encoding=utf-8 (pep 0263)

""" docstring """

# imports
# functions

if __name__ == "__main__":
    # actual main

Standard terminal IO

There are couple of functions you can use to read data from keyboard and print them to screen:

>>> data = input()
# input: [1, 2, 3]
>>> print data
[1, 2, 3]
>>> print(data)
[1, 2, 3]

>>> data = input()
# input: "test"
>>> print data, data, data
test test test
>>> print repr(data)
'test'

>>> data = raw_input()
# input: "test"
>>> print data
"test"

There are also direct ways to access stdin/out/err streams as you can see in the example below. We will deal with the import statement and file operations later.

>>> import sys
>>> sys.stdin.readline()
# input "test" without quotes and notice the \n character
'test\n'
>>> sys.stderr.write("Error!!\n")
Error!!

Debugging - python

Python contains an integrated debugger pdb to help you with testing and debugging. If you want to start the program from the beginning in debug mode, launch it as:

python -m pdb script.py

You can also add the debugger call to the place you want to debug, just import pdb and then at the proper place write:

pdb.set_trace()

The last nice thing for us at the moment will be post mortem debug. Just call pdb.post_mortem() in the exception handler.

try:
    raise Exception()
except:
    import pdb
    pdb.post_mortem()

Debugging - gdb

First, prepare your system:

$ yum install -y gdb yum-utils
$ debuginfo-install python

Then you can start your python process and connect to it:

$ gdb python <pid>

Gdb is a C debugger so you will see the C frames by default, but it also contains a set of useful macros for python:

One of many pages describing this is for example Low-level Python debugging with GDB.

In case you need to debug a highly available process (that contains timeouts, sockets, threads..) you can use gdb in batch mode like this:

$ gdb -batch -ex "t a a py-bt" python $(cat /var/run/vdsm/vdsmd.pid)

Types

All types are fake objects - they are not real objects, but they have methods. This slide is just to give you an overview. We will discuss specific types in detail after a short intermezzo about python modules, basic control structures and functions.

# module importing
# we will deal with modules in detail during the next lesson
import module_name

# comprehensive help
help(str)

# list all attributes of a variable
dir(str)
dir("a string")

Modules

# correct:
import os.path
os.path.isdir("/usr/bin")
os.path.isfile("/usr/bin/python")

# also correct:
from os.path import isdir, isfile
isdir("/usr/bin")
isfile("/usr/bin/python")

# incorrect (bad style and hard to maintain):
import os, os.path, sys

What happens during import?

import is an __import__ function call in disguise and is therefore interpreted during runtime. Also the newly imported module is interpreted sequentially as if it was a standard python script (which it is) and any encountered import statement is processed immediatelly (beware of cyclic deps!).

  1. sys.modules dictionary is consulted first
  2. all parent modules are imported (using this very same algorithm)
  3. current directory and sys.path list are consulted to find the proper file/directory
  4. imported module is saved to sys.modules
  5. imported module is executed
  6. reference to the imported module/object is created in place if the import call

Modules - writing a new module

# Python supports relative imports from within a module
# this style is mandatory in Python 3

# imagine this structure:
.
├── a
│   ├── a1.py
│   └── __init__.py
├── b
│   ├── b1.py
│   └── __init__.py
└── __init__.py

# this content is valid in a1.py
from . import a1

# this is valid in a1 only if the
# top level __init__.py is present
from .. import b
from ..b import b1

Python control structures - 1

There is nothing complicated about if statement:

if condition:
  pass
elif condition:
  pass
elif condition:
  pass
else:
  pass

The logic operators are written in "english" form and, or, not. When checking for presence of an element in list or dictionary, use the in operator. There is also the special is operator that should be used when testing for None, because it can't be overloaded.

if True in some_list and (somevar is not None or somevar == "test"):
  pass

For loop is the equivalent of for each in other languages. You usually do not use index to access elements, but get the elements directly.

fruits = ["apples", "oranges", "cherrys"]
for x in fruits:
    print x

There are couple of helper functions as well:

# to get indexes
for idx,x in enumerate(fruits):
    print "fuits[%d] == %s" % (idx, repr(x))

# to iterate over sorted
for x in sorted(fruits):
    print x

Python control structures - 2

While loops are simple as well:

while condition:
  pass

Both for and while loops can have optional else clause that gets executed when the loop exited normally (meaning without the use of break statement).

def has_odd_number(l):
    for num in l:
        if l % 2:
            print "List contains odd number."
            break
    else:
        print "There was no odd number in the list."

And as you might have noticed in the last paragraph, there are two statements that can control the loop flow:

Functions

Functions are a first class objects in Python. They even have attributes, but you usually do not access them unless you do something really advanced.

def function_name(arguments):
    pass

Function definition is evaluated at run time (only once). I creates the function object, evaluates default argument values and creates a reference to itself with the given name. This means you can redefine the name:


>>> def a():
...     print 1
>>> print type(a)
<type 'function'>
>>> a()
1

>>> a = 2
>>> print type(a)
<type 'int'>

>>> def a():
...     print 3
>>> print type(a)
<type 'function'>
>>> a()
3

Types - strings

# initialization
s1 = "string"
s2 = u"unicode string"
s3 = "%d bottles of %s wine" % (99, "red")
s4 = "%(number)d bottles of %(colour)s wine" % {"number": 99, "colour": "red"}
s5 = """long string spanning
multiple lines with "quotes"
"""

# basic operations
s3.split()
s3.split(" ", 1)
s4.upper() != s4.lower()
", ".join(["apples", "oranges", "coconuts"])
s4.replace("o", "a", 1) + "yard"

Types - lists, tuples

list1 = [1, 2]

# assigning list contents to variables
[elementa, elementb] = list1

for i in list1:
    print i
# 1
# 2

list1.append(3)
print list1
# [1, 2, 3]

tuple1 = ("a", "b")
for index, i in enumerate(tuple1):
    print index, i
# 0 a
# 1 b

for i in range(2, 5):
    # also see xrange for big numbers
    print i
# 2 3 4

A bit of advanced string parsing using shlex

Standard module shlex contains split function which (among other possible configurations) can split a string into parts while taking care not to split apostrophed strings.

>>> import shlex
>>> shlex.split("""./test arg1 arg2 "arg3" "arg4 arg4b" """)
['./test', 'arg1', 'arg2', 'arg3', 'arg4 arg4b']

Regular Expressions

import re
EMAIL_RE = re.compile(r"(.*)@(.*)")

def split_emails(email_list):
    result = []
    for email in email_list:
        result.append(EMAIL_RE.match(email).groups())
    return result

split_emails(["spam@example.com", "eggs@example.com"])
# [('spam', 'example.com'), ('eggs', 'example.com')]

Strings and Unicode

The standard type for a string in Python 2 is str. It behaves as a dumb byte array. In cases where you need to count number of characters in multi-byte strings (utf-8) you need to get the data (or convert them) in the unicode type.

# encoding=utf-8
> s = "Hezký žluťoučký kůň úpěl ďábelské ódy"
> print len(s)
50
> len(s.decode("utf-8"))
37

There are two methods that are used to convert between unicode and str:

When you start your Python program from terminal, it will configure the stdout encoding according to the current locale. This way it knows how to print unicode strings.

However if you redirect the output using pipes, there is no locale assigned to it and it will raise an exception unless you set the PYTHONIOENCODING environment variable or encode the unicode string manually prior to outputing it.

Types - lists, tuples - there is more

list1 = [0, 1, 2, 3, 4, 5, 6, 7]

print list1[2:5]
# the same as list1 <2,5) in math
# [2, 3, 4]

print list1[:5]
# the same as list1 <0,5) in math
# [0, 1, 2, 3, 4]

print list1[2:]
# the same as list1 <2,len) in math
# [2, 3, 4, 5, 6, 7]

print list1[:]
# the same as copy.copy(list1)
# [0, 1, 2, 3, 4, 5, 6, 7]

print list1[:5:2]
# the same as list1 <0,5) with step 2 in math
# [0, 2, 4]

print list1[:-2]
# the same as list1 <0,len - 2) in math
# [0, 1, 2, 3, 4, 5]

print list1[-2:]
# the same as list1 <len - 2, len) in math
# [6, 7]

print list1[::-1]
# the same as reversed(list1)
# [7, 6, 5, 4, 3, 2, 1, 0]

Functional features

List comprehensions

Map and filter from the previous slide can be easily combined. There is a syntactic form called list comprehensions to make it easier.

# map
> a = [ x*x for x in range(10) ]
> type(a)
list

# map + filter
> [ x for x in range(10) if x % 2 ]

When your list contains too many elements, use simple brackets to compute the values as needed (you will get a generator instead of a list) to avoid excessive memory consumption:

> a = ( x*x for x in xrange(10000000) )
> type(a)
<generator object <genexpr> at 0x1e277d0>

Types - dicts

# comprehensive help
help(dict)

# creating an empty dictionary
d1 = dict() # dict(key=value, ...)
d2 = {}     # {key: value, ...}

# assigning a value to key
d1["key"] = ["1", 2, 4, 5]
d2[12587] = "value"

# setting a default value
d2.setdefault("foo", "spam")
d2.setdefault("foo", "eggs")
# d2["foo"] contains "spam" (2nd setdefault doesn't do anything)

# getting value back
print d1["key"]
print d2.get("key")
print d2.get("key", "default value")

if "key" in d1:
    print "'key' found in dict 'd1', value is: %s" % d1["key"]

# iterating over dictionary
for k,v in d2.iteritems():
    print k, "=", v

Collection structures

Python provides many useful composite/high level types as well. The three I am introducing here are all defined in the collections module:

- deque - OrderedDict

Function arguments

def write(msg, error_code=0):
    print locals()

write("x")
# {"msg": "x", "error_code": 0}

def beware(default_arg = []):
    default_arg.append("single element list?")
    print default_arg

beware()
# ["single element list?"]
beware()
# ["single element list?", "single element list?"]

def write(msg, error_code=0, exception_info):
    print locals()

write("x", "some information about the exception")
# SyntaxError: non-default argument follows default argument


def write(msg, error_code=0, *args, **kwargs):
    print locals()

write("x", "y", "z", info="some information about the exception", error_code=1)
# {"msg": "x", "error_code": 1, args=["y", z"], kwargs={"info": "some information about the exception"}}

Variable scope

When you reference a global name, Python dynamically looks for the name in the following places:

  1. L. Local – local scope of the currently evaluated function (def or lambda) with the exception of names marked as global.
  2. E. Enclosing functions' locals – Python looks into enclosing (closures are defined with the lexical source code hierarchy) functions' (def, lambda) scopes in the inner to outer order.
  3. G. Global – global scope is defined as the top level scope of the module the currently running function was defined in.
  4. B. Built-in – built-in scope is the scope of the special __builtin__ module. It contains names like open, range, SyntaxError, ... and you can add your own there. It is not recommended though.

Nested functions

All the previous slides mean that you can create nested functions and pass the function references around as if it were a standard variable.

>>> def generate():
...     def internal_func(a):
...         print a
...     return internal_func

# The definition is evaluated only once
# so the same object is used in subsequent
# calls.
>>> print generate()
<function internal_func at 0x162baa0>
>>> print generate()
<function internal_func at 0x162baa0>

Exception handling

Exceptions are cheap in Python, but should be still used only for error reporting. Do not use them in place of return statements.

try:
    code_that_can_throw_exception

    # to raise an exception manually use
    raise ExceptionType()

except ExceptionTypeA as e:
    exception_handler

except (ExceptionTypeB, ExceptionTypeC) as e:
    another_exception_handler

else:
    code_to_execute_when_no exception_happened
    # used when the else code can throw an exception
    # which should not be handled here

finally:
    code_that_will_be_always_executed
    # even when the code above uses return statement

Please avoid using except clause without any type as it will catch all exceptions, including SyntaxError. If you want to catch all exceptions use except Exception:

Classes

Python supports objects with all three keystones of OOP: encapsulation, polymorphism and (multiple) inheritance.

class ClassName(object):
    pass

The basic (and "empty") class you can see above. The object mentioned in parentheses specifies the parent class. For all new style classes, the common parent has to be the object class. Python 2.6 and 3 also support interfaces, you can read more about them at Python website.

class ClassName2(ClassName):
    def method(self, argument):
        return argument

What you can see here is an inherited class with one method. All methods receive (automatically) reference to the instance as the first argument. The usual name for it is self.

# example of creating an instance and method invocation
cl = ClassName2()
cl.method("Hello World")

Classes - as a simple data containers

class Data(object):
    pass

a = Data()
a.smth = 1
a.smthelse = 2

But named tuple is preferred:

from collections import namedtuple
Data = namedtuple("Data", ("smth", "smthelse"))
a = Data()

Classes - special methods and variables

class ClassName(object):
    def __init__(self, spam):
        # assign value to an attribute
        self.spam = spam

        # protected attributes start with '_'; shouldn't be accessed from outside of the object
        self._spam = spam

        # private attributes start with '__'; do not use unless you know what you do !!!
        # acessible as self.__NAME from within the object
        # acessible as obj._ClassName__NAME from the outside
        self.__spam = spam

For the complete list look at the documentation.

Static/Class methods and objects

It is possible to define class attributes and class methods. Those are methods that are tied not to instance, but to the type itself. Classmethods get the current type as their first argument.

class MyClass(object):
    WRAPPER = "prefix %s suffix"

    @classmethod
    def factory_method(cls, value):
        # cls is the conventional name for the type
        # argument in classmethods
        return cls(value)

    # instance specific __init__
    def __init__(self, value):
        self.value = self.WRAPPER % value

When you access attributes, the following order applies (remember that a method is just another attribute that references function object):

Classes - access order example

>>> class Test(object):
...   attr = "class attr"
...   @classmethod
...   def test(cls):
...     print cls
...   def __init__(self):
...     self.attr = "instance attr"
...

# accessing class attributes using the Type name
>>> Test.attr
'class attr'
>>> Test.test()
<class '__main__.Test'>

# create and access the instance
>>> inst = Test()
>>> inst.attr
'instance attr'

# dynamically add class attribute to the type
>>> Test.second = "class attr 2"

# access the class attributes using the instance
>>> inst.second
'class attr 2'
>>> inst.test()
<class '__main__.Test'>

# assign to the instance using the same name
>>> inst.second = "instance attr 2"
>>> inst.second
'instance attr 2'

# confirm that the class attribute kept the value
>>> Test.second
'class attr 2'

Classes - properties

Python does not enforce encapsulation, but provides the possibility to define getters, setters and deleters which can be used to access the internal state in more controlled way.


class TestCls(object):
    def __init__(self, value = 0):
        self.value = value

    @property
    def my_attr(self):
        return self.value

    @my_attr.setter
    def my_attr(self, value):
        self.value = value

    # this does not really make sense, so
    # take is as an example here
    @my_attr.deleter
    def my_attr(self):
        self.value = 0

Working with files

Python provides filesystem access using the open builtin and the os and os.path modules.

# open file using the old fashioned way
fo = open("filename.txt", "r")

# this for iterates over all lines in text file
# you can also manually call
# fo.readlines() to get a list of all lines
# or
# fo.readline() to get just the next line
for line in fo:
    print line

# python will close the file as soon as
# the reference to fo vanishes and is garbage collected
fo.close()
# modern way of opening a file
with open("filename.txt") as fo:
    # read without the optional number of bytes
    # argument reads the whole file into memory
    content = fo.read()

# file is closed automagically when the with block
# ends or throws an exception

Working with binary files

Binary files are accessed in the exact same way as text files. You only add binary flag to the mode string and use the struct module to decode and encode data.

The struct module format strings supports different packing and byte order and many different data types.

import struct
from collections import namedtuple

# read 22 bytes from file
with open("data.db", "rb") as fo:
    raw = fo.read(22)

# dummy data to demonstrate unpacking...
raw = "Martin Sivak        \x04\x06"

# not necessary, but it makes the code maintaineable
# since you are using named attributes instead
# of plain tuple where you need to know the indexes
Person = namedtuple("Person", ("name", "day", "year"))

# unpack the data
data = Person._make(struct.unpack("20sBB", raw))
# data.name.rstrip() is now "Martin Sivak"

# here you can see that the named tuple is really just a tuple
# and I uset a start to expand it as function arguments
output = struct.pack("20sBB", *data)

# and a runtime assert check to confirm that
# unpack/pack pair works properly
assert raw == output

Serialization

pickle and cPickle

In case you need to store python structures to a file (or database), you have couple of couices:

  1. use builtin repr function to get a representation of the python data structure - this representation can be then evaluated in python to get the value back
    def fce():
      pass
    
    a = ["a", 1, 2, ["a", (5, 6), fce]]
    
    repr(a)
    "['a', 1, 2, ['a', (5, 6), <function fce at 0x7f54513377d0>]]"
    
  2. use builtin module pickle and let python do the serialization and loading for you (with more efficient representation)
    import cPickle as pickle
    
    file_obj = open("spam", "wb")
    pickle.dump(obj, file_obj)
    
    file_obj = open("spam", "rb")
    obj = pickle.load(file_obj)
    
    string = pickle.dumps(obj)
    obj = pickle.loads(string)
    

Gzip compression

Read a gzipped file:

import gzip

f = gzip.open("ircbot.log.gz", "rb")
data = f.read()
f.close()

Write to a gzipped file:

import gzip

data = "..."
f = gzip.open("ircbot.log.gz", "wb")
f.write(data)
f.close()

Logging

The logging module provides a standard configurable way to log messages with different severity to stdout, files, network...

# the simpliest example
import logging

# get a named logger (same name always returns the same object)
log = logging.getLogger("mylog")
log.warn("warning...")

Loggers can be hierarchical and can be configured to do message filtering or preprocessing and to use different outputs.

# more complicated example
import logging
import logging.handlers

# one logger
log = logging.getLogger("project")

# logs errors to console
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.ERROR)
log.addHandler(console_handler)

# logs warnings and errors to file
log.addHandler(logging.FileHandler("project.log"))

# and this is sets the message format
log.setFormatter(logging.Formatter(fmt="project: %(message)s"),
                                   datefmt="%Y:%m")

# child logger
log_hosts = logging.getLogger("project.hosts")

# be very verbose
log_hosts.setLevel(logging.INFO)

# do not send events to the main logger
log_hosts.propagate = False

# handle external log rotation actions well
log.addHandler(logging.handlers.WatchedFileHandler("hosts.log", encoding="utf-8"))

# sub-child logger that filters messages using more complicated rules
log_host_normal = logging.getLogger("project.hosts.normal")
log.addFilter(logging.Filter()) # bogus - passes everything

# sub-child handler that is even more verbose
log_host_priority = logging.getLogger("project.hosts.priority")
log_host_priority.setLevel(logging.DEBUG)

Command line arguments

To avoid hardcoding runtime arguments directly into the source code, we can use couple of approaches. One of them is passing the arguments through command line interface.

Basic access to cmdline arguments is provided by sys.argv list. It is standard python array in the form of:

import sys
[program_name, argument1, argument2, ...] = sys.argv

This is the same structure like you may be used to in C language.

Option parser

Python provides higher level helper to make parsing options in differrent formats easier. You just construct a list of accepted options, their arguments, description... and create an object.

import optparse

parser = optparse.OptionParser("%prog [options]") # 1st argument is usage, %prog is replaced with sys.argv[0]
parser.add_option(
    "-s", "--server",    # short and long option
    dest="server",       # not needed in this case, because default dest name is derived from long option
    type="string",       # "string" is default, other types: "int", "long", "choice", "float" and "complex"
    action="store",      # "store" is default, other actions: "store_true", "store_false" and "append"
    default="localhost", # set default value here, None is used otherwise
    help="IRC server address",
)
options, args = parser.parse_args()
# options.key = value
# args = [arg1, ... argN]

print options.server

# use parser.error to report missing options or args:
parser.error("Option X is not set")

# object to dictionary conversion
vars(options)

Configuration files

Using the command line gets quite mundane after some time, so we prefer if all "constant" options can be set using some kind of configuration file.

There are also multiple ways to accomplish this:

ConfigParser

ini file example:

[section]
key = value

Basic usage:

import ConfigParser

config = ConfigParser.SafeConfigParser()
config.read("ircbot.ini")

# get values from config parser
config.get(section, option)
# config.get{boolean, float, int}

# set custom values
config.add_section(section)
config.set(section, key, value)

Advanced: Pythonizing ConfigParser

To make the class behave more in the Python way (eg. access attributes using the dot notation, just derive your own object and add some "advanced magic" (modification to the class internal __ methods).

For an example of how to do it, take a look at this source file.

The change requires you to understand internal methods of classes. You can read more about them in the documentation. Take a good look at __getattr__ method.

Concurrent programming

Sometimes we need to start a task and set it aside to run almost independently while we do other stuff. There are couple of approaches to achieve this.

Even in Python world not everything is perfect. When you're using threading heavily, take time to read about Global Interpreter Lock (http://www.dabeaz.com/python/GIL.pdf)

subprocess - Python interfaces to processes

Python gives you the standard fork, exec and pipe calls via the os module. You can use them as you you are used to from C. But there is much more comfortable way. Meet the subprocess module.

The main class we will be interested in is subprocess.Popen. It will allow us to start new peocess, change root, communicate with it and collect the return value once it ends.

# simple example
task = subprocess.Popen(args = ("/bin/uname", "-a"))
task.wait()
print task.returncode

# the same can be accomplished by simplier
print subprocess.call(args = ("/bin/uname", "-a"))
  

If you need to create pipes to communicate with the process, just use one of the stdout, stderr, stdin arguments with the subprocess.PIPE value. You can then access the names as attributes and work with them as with any other file (read, write, readline..). There is also a helper method communicate that accepts the input you want to send to the process and returns a tuple (stdout, stderr) once the program finishes. The outputs are buffered in memory so do not use this when you are expecting a lot of data.

# complicated example
task = subprocess.Popen(args = ("cat"),
                        executable = "/bin/cat",
                        stdout = subprocess.PIPE,
                        stdin = subprocess.PIPE)
print task.pid
task.stdin.write("This line should be returned as is\n")
print task.stdout.readline()
task.stdin.close()
task.wait()
print task.returncode
  

thread, threading - Python interfaces to threads

Both modules give you access to thread management in python. thread provides low level interface on the level of posix threads (pthread in C) while threading encapsulates the inteface into a set of Python objects.

# simple
def my_func(...):
  pass # empty thread

# args and kwargs are passed to my_func when it executes
my_thread = threading.Thread(target = my_func, args = (...), kwargs = {...})
my_thread.start()
my_thread.join()
# complicated :)
class MyThread(threading.Thread):
  def __init__(self, *args, **kwargs):
    self.my_args = args
    self.my_kwargs = kwargs
    threading.Thread.__init__(self, *args, **kwargs)

  def run(self):
    """Your code which should be used in thread.
       access args and kwargs using self.my_args..."""

# vv -- inherited methods -- vv

  def start(self):
    "Starts the thread and executes self.run() in it."

  def join(self):
    "Waits for the running thread to finish."

Shared resources in threads

Because threads have unlimited access to it's process' memory, caution must be excersised when manipulating it to avoid accidental memory corruptions.

To avoid a problem with two (or more) threads rewriting the same place of memory, some kind of lock mechanism has to be used. POSIX threads name the locks as mutexex. Python exports them as a threading.Lock object.

import threading
lock = threading.Lock()
lock.acquire()
-- critical section --
lock.release()

It is important to avoid a few special situations. Most notable of them are:

The simpliest way to prevent this is to remember following rules:

Queue and PriorityQueue

To make your life a little simplier, Python library contains a couple of thread-safe (you do not have to lock it yourself) "types" as objects. We will need a queue of some kind.

Queue.Queue object and it's variant Queue.PriorityQueue (Python >= 2.6) give us very important and useful tool, because they are thread safe, so we can use them to pass information between threads without any difficult locking logic.

Queue behaves as FIFO whilst PriorityQueue returns the lowest value item first.

import Queue
q = Queue.Queue()
q.empty()
q.full()
q.get(block = True, timeout = None) #q.get_nowait()
q.put(item, block = True, timeout = None) #q.put_nowait()
  

Interruptable waiting

Sometimes we have really nothing to do, but a new assignment can arrive at any moment. In that case just sleeping for finite amount of time is not a good idea. But there is solution to this problem in the threading module also.

threading.Event class supports setting the event to "it happened" state or to "nothing has happened" state. Thread can then use Event.wait(timeout) to wait for the event but no more than timeout seconds. Compare it with time.sleep(time) which always waits for the specified amount of time.

event = threading.Event()
event.wait([timeout=SECONDS])  # wait until the event flag is set or a timeout is reached
event.set()   # set the event flag, terminates waiting
event.clear() # clear the event flag

Lock with signaling - Condition

When you have to combine Locks with Events, you can use another handy class - Condition.

All operations with the Condition require that you hold it's internal lock first (acquire, release). If you do you can then wait for the event to happen (wait) and that causes the lock to be automatically released. When a thread is notified it wakes up. One of the awakened threads is selected, gets the lock back and continues with execution.

Sending the event notification is as simple as acquiring the lock (remember that you have to hold it first!), calling notify (wakes one thread) or notifyAll (wakes all waiting threads) and then releasing the lock.

cv = threading.Condition()

# Consume one item
cv.acquire()
while not an_item_is_available():
    cv.wait()
get_an_available_item()
cv.release()

# Produce one item
cv.acquire()
make_an_item_available()
cv.notify()
cv.release()

multiprocess - Thread like interface to processes

It is possible to use processes instead of threads to avoid complications caused by GIL. The architecture can use message passing using multiprocessing.Queue (multiprocess aware implementation), primitives like multiprocessing.Lock, multiprocessing.Condition, multiprocessing.Event and others.

If shared memory is really necessary, you can use shared memory typed objects multiprocessing.Value or multiprocessing.Array.

from multiprocessing import Process

def f(name):
    print 'hello', name

if __name__ == '__main__':
    p = Process(target=f, args=('thread_name',))
    p.start()
    p.join()

Multiprocessing module also supports a pool of processes that can be used for distributed computing:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))

Distributing python projects

There is a standard way of building/distributing python programs and modules. The setup is described in the distutils' package documentation so I will tell you just the basic parts. The standard package is getting old nowadays and you might encounter setuptools or the newest distutils2 package. All are used in the same way and have almost the same base API.

The main build script is called setup.py and should reside in your project's top level directory. It usually contains only couple of lines describing the project, it's structure and file locations. See the example below. There can also be a MANIFEST.in file that contains the list of files that should be included in the source code distribution (wildcards allowd).

import os
from setuptools import setup, find_packages
from babel.messages import frontend as babel
from glob import *

# Utility function to read the README file.
def read(fname):
    return open(os.path.join(os.path.dirname(__file__), fname)).read()

data_files = [('/usr/lib/systemd/system', glob('systemd/*.service')),
              ('/usr/share/initial-setup/modules', glob('modules/*'))]

# add localization files
data_files += [('/usr/share/locale/%s/LC_MESSAGES' % dirname,
                ['locale/%s/LC_MESSAGES/initial-setup.mo' % dirname])
                for dirname in os.listdir("locale")
                if not dirname.endswith(".pot")]

setup(
    name = "initial-setup",
    version = "0.3.4",
    author = "Martin Sivak",
    author_email = "msivak@redhat.com",
    description='Post-installation configuration utility',
    url='http://fedoraproject.org/wiki/FirstBoot',
    license = "GPLv2+",
    keywords = "firstboot initial setup",
    packages = find_packages(),
    package_data = {
        "": ["*.glade"]
    },
    scripts = ["initial-setup", "firstboot-windowmanager"],
    data_files = data_files,
    setup_requires= ['nose>=1.0'],
    test_suite = "initial_setup",
    long_description=read('README'),
    classifiers=[
        "Development Status :: 3 - Alpha",
        "Environment :: X11 Applications :: GTK",
        "Environment :: Console",
        "Intended Audience :: System Administrators",
        "Topic :: System :: Systems Administration",
        "License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)",
    ],
    cmdclass = {'compile_catalog': babel.compile_catalog,
                'extract_messages': babel.extract_messages,
                'init_catalog': babel.init_catalog,
                'update_catalog': babel.update_catalog}
)

Distributing python projects - cont.

To make the setup.py based distribution work well you should adhere to a few simple rules:

The setup.py file then works as a script with many useful commands. The most important are below the example.

$ python setup.py install

Once a package is present in PyPI, you can use one of the following commands to download it and install it together with all dependencies.

$ easy_install package
$ pip install package

Network connections - low level way

# import the module
import socket

# resolve address
for (family, socktype, proto, canonname, sockaddr) in socket.getaddrinfo("www.fit.vutbr.cz", 80):

s = socket.socket(family, socktype, proto)

# create CLIENT a socket and connect it
s.connect(sockaddr)

# or create a SERVER socket and wait for connection
s.bind(sockaddr)
s.listen(1) # 1 - the allowed queue length
client = s.accept() # the rest is the same, only use client instead of s

# set timeout to 5 minutes
s.settimeout(300)

# get info about peer
s.getpeername()

# raw read and write
s.recv(size)
s.send(data)

# get file object and read one line
f = s.makefile(mode="rw")
f.readline() # reads including line terminators
  

SocketServer

import SocketServer
import threading

class ThreadedTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
    # Ctrl-C will cleanly kill all spawned threads
    daemon_threads = True

class ThreadedTCPRequestHandler(SocketServer.BaseRequestHandler):

    def handle(self):
        data = self.request.recv(1024)
        cur_thread = threading.current_thread()
        response = "{}: {}".format(cur_thread.name, data)
        self.request.sendall(response)

if __name__ == "__main__":
    # Port 0 means to select an arbitrary unused port
    HOST, PORT = "localhost", 0

    server = ThreadedTCPServer((HOST, PORT), ThreadedTCPRequestHandler)
    ip, port = server.server_address

    # start server
    print "Running on: %s:%s" % (HOST, PORT)
    server.serve_forever()
    # server.shutdown() stops serving, but has to be started from a different thread (serve_forever blocks current thread)
  

Using the powers of Python, you can then easily implement threaded HTTP server on 7 lines:

import SocketServer
import SimpleHTTPServer

class ThreadedHTTPD(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
        pass

HOST, PORT = "localhost", 8080

httpd = ThreadedHTTPD((HOST, PORT), SimpleHTTPServer.SimpleHTTPRequestHandler)
httpd.serve_forever()
  

Retrieving data from HTTP

To get some data from HTTP, you can use Python provided abstraction - httplib.

httplib.HTTPConnection allows you to very simply request (or send) data over HTTP protocol. The example below should explain more:

import httplib
c = httplib.HTTPConnection("www.fit.vutbr.cz")
c.request("GET", "/")
r = c.getresponse()
print r.status, r.reason
data = r.read()
c.close()
print data
  

If you aren't familiar with HTTP methods, the most used are:

There are also number of other methods mostly used in extensions like WebDAV, SVN, .. – PUT, DELETE, ..

The standard library contains one more module which can be used to push/retrieve data from web. It is called urllib2. All you need to do to retrieve data is:

>>> import urllib2
>>> f = urllib2.urlopen('http://www.python.org/')
>>> print f.read(100)
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<?xml-stylesheet href="./css/ht2html

Also take a look at the urlgrabber library. It is not a standard one, but a very usefull one.

XML-RPC Server

Python has a XML-RPC Server implementation, all you need to do is to add threading support by using ThreadingMixin.

import SocketServer
import SimpleXMLRPCServer


class SimpleThreadedXMLRPCServer(SocketServer.ThreadingMixIn, SimpleXMLRPCServer.SimpleXMLRPCServer):
    pass
  

You can do the same thing using SocketServer and get a basic network server almost for free: http://docs.python.org/library/socketserver.html.

def foo_handler():
    return "foo"


class HandlerClass(object):
    def bar_handler(self):
        return "bar"


server = SimpleThreadedXMLRPCServer((address, port), allow_none=True)

# register functions to the server
server.register_introspection_functions()
server.register_instance(HandlerClass())
server.register_function(foo_handler)

# start server
server.serve_forever()
# server.shutdown() stops serving, but has to be started from a different thread (serve_forever blocks current thread)

XML-RPC Client

XML-RPC Client is really easy to use.

import xmlrpclib


client = xmlrpclib.ServerProxy("http://localhost:8001", allow_none=True)
print client.foo_handler()
print client.bar_handler()

Gui programming with Python

As python is a language with strong introspection and has support for dynamically created behaviour, GUI programming is very easy.

There are many frameworks supporting abovementioned python features. Python itself contains bindings for Tk (the module is named Tkinter). But today, the most important frameworks are Gtk and Qt. We will concentrate on Gtk (or it's pythonized brother PyGTK), but very similar concepts should work in Qt, wxWidgets or Tk.

Gtk has moved to new API with version 3 (completely based on introspection features), but we will use version 2 to make it work on older systems. In version 3.x there is no pygtk module and everything is done using gi module (glib introspection).

Hello world in PyGtk

# Gtk2 pygtk version
import pygtk
pygtk.require('2.0')
import gtk

# Gtk3 version
# from gi.repository import gtk

class HelloWorld:
  def destroy(self, widget, data=None):
    gtk.main_quit()

  def __init__(self):
    self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
    self.window.connect("destroy", self.destroy)

    self.label = gtk.Label("Hello World")
    self.label.show()
    self.window.add(self.label)
    self.window.show()
    gtk.main()

if __name__ == "__main__":
  HelloWorld()

PyGtk documentation

GTK project also has some useful abstractions over threads, streams and others. You can find the reference manual to Python GObject library at http://library.gnome.org/devel/pygobject/stable/

Gtk Builder

To avoid building everything manually and hardcoding the GUI layout into code, we have tools available, which will load the design from XML description and connect it to our application's logic.

Both these ways are very simmilar to use, consider following examples of importing sample layout and showing the window:

# Gtk2 - old glade
import gtk
import gtk.glade
gl = gtk.glade.XML("x.glade")
gl.get_widget("main_window").show()
gtk.main()

# Gtk2 - new glade
gtkb = gtk.Builder()
gtkb.add_from_file("x.xml")
gtkb.get_object("main_window").show()
gtk.main()

# Gtk3
from gi.repository import Gtk
gtkb = Gtk.Builder()
gtkb.add_from_file("x.xml")
gtkb.get_object("main_window").show()
Gtk.main()

Connecting actions in GUI to methods

When you have all the elements you need, it is time to assign actions to buttons and other pieces which may react to user's actions. There are (as usual) couple of ways to do it:

class GUI(object):
    def __init__(self):
        self.gtkb = gtk.Builder()
        self.gtkb.add_from_file("x.xml")
        self.gtkb.connect_signals(self)
        self.mainwindow = self.gtkb.get_object("main_window")
        self.mainwindow.show()
        gtk.main()

    def on_window_destroy(self, w, data = None):
        gtk.main_quit()

Using Gtk in threaded app

Gtk is not thread safe, so to use it from multithreaded application, some precautions have to be made:

# use gobject instead of Glib in Gtk2
# wait variant @ http://git.fedorahosted.org/cgit/anaconda.git/tree/pyanaconda/ui/gui/utils.py
# decorator documentation http://www.python.org/dev/peps/pep-0318/

def gtk_thread_nowait(func):
    """Decorator method which causes every call of the decorated function
       to be executed in the context of Gtk main loop. The new method does
       not wait for the callback to finish. """

    def _idle_method(args):
        """This method contains the code for the main loop to execute.
        """
        ret = func(*args)
        return False           # has to return False so it is executed only once

    def _call_method(*args):
        """The new body for the decorated method. """
        GLib.idle_add(_idle_method, args)

    return _call_method
  

Usage:

@gtk_thread_nowait
def my_gtk_touching_method(...):
  do gtk calls and return as soon as possible

# will be executed inside the Gtk mainloop
my_gtk_touching_method()
  

Testing in general

Testing in Python

There are probably two most prominent packages that faciliate test writing.

Unit Testing in Python

It is important to think about testability of the code. Interfaces and APIs. It will result in better code and you will be able to write the tests easily. But this is big and complicated topic, so here are some links where you can read about it. The keywords are:

The two major points are probably:

Unit Testing in Python -- Tools

There are couple of tools which make it easier to write tests in Python.

Scientific usage - pure Python

Financial and other high precision calculations cannot use float type. The reason for that is the exponential precision of IEEE754 floating point numbers. If float is used, rounding errors will occur!

>>> print "%.8f" % (0.1 + 1e12)
1000000000000.09997559

Python provides a standard module for precision arithmetic called Decimal. This module allows you to configure the precision you need in terms of decimal places. It also allows you to use string (and other types) to define a number that would not be representable by float.

>>> from decimal import *
>>> getcontext().prec = 6
>>> from decimal import *
>>> Decimal("1e12") + Decimal("0.1")
Decimal('1000000000000.1')

You might also want to store a big set of numbers to array. Unfortunately Python lists are not the best structure to do that. Instead Python provides you with the array library.

Array lets you define an array structure for elements with fixed type. Compare the following:

# high memory usage
example1 = range(1000000)

# much lower memory usage
example2 = array.array('i', xrange(1000000))

Scientific usage - NumPy

Python does not contain any algebra functions. So when you need to work with vectors or matrixes you have to use an external library like NumPy (numerical python) – http://www.numpy.org/. This library is implemented as very fast C+Fortran extension and there are even attempts to make it support CUDA.

All the standard matrix methods are of course supported. See http://docs.scipy.org/doc/numpy/reference/routines.linalg.html for the full list.

NumPy also supports simple reading and writing data files in CSV or TSV formats. http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt.

# This example is taken from the NumPy website
>>> from numpy import *
>>> def f(x,y):
...         return 10*x+y
...
>>> b = fromfunction(f,(5,4),dtype=int)
>>> b[2,3]
23
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])
>>> b*b
# test for yourself :)

NumPy might seem to add multidimensional arrays to Python. But do not be confused, it is just a clever syntax trick internally.

>>> class T(object):
...   def __getitem__(self, arg):
...     print arg
...
>>> a = T()
>>> a[5]
5
>>> a[5,5]
(5, 5)
>>> a[5,5::-2]
(5, slice(5, None, -2))

Tutorial can be found here: http://wiki.scipy.org/Tentative_NumPy_Tutorial or here: http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb

Scientific usage - SciPy

SciPy extends NumPy. It adds more science related functions for sparse matrixes, optimizations, signal processing or statistics.

Scientific usage - Graphing

Once you have the results, it is time to plot them :). There is a great library for Python that does exactly that – matplotlib with tutorial at http://www.loria.fr/~rougier/teaching/matplotlib/.