Generators and Coroutines Edited version of the slides by David Beazely
http://www.dabeaz.com
Monday May 16, 2011
Part 1: Iterators
Monday May 16, 2011
Copyright (C) 2008, http://www.dabeaz.com 1-
part One
11
Introduction to iterators and generators
Copyright (C) 2008, http://www.dabeaz.com 1-
repetition
• As you know, Python has a "for" statement. • You use them to iterate through a collection of items
12
>>> for x in [1,4,5,10]:
...x print,
...
1 4 5 10
>>>
• And, as you've probably noticed, you can iterate over many different types of objects (not just lists).
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration over a dict
• Going through a dictionary will give you the keys
13
>>> Prices = { 'GOOG' : 490.10,
... 'AAPL': 145,23,
... 'YHOO' : 21.71 }
...
>>> Enter prices:
... print keys
...
YOO
GUT
AAPL
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration over a string
• If you repeat a string, you get characters
14
>>> s = "Wow!"
>>> for c in s:
... print c
...
Y
Ö
W
!
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration over a dict
• Going through a dictionary will give you the keys
13
>>> Prices = { 'GOOG' : 490.10,
... 'AAPL': 145,23,
... 'YHOO' : 21.71 }
...
>>> Enter prices:
... print keys
...
YOO
GUT
AAPL
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration over a string
• If you repeat a string, you get characters
14
>>> s = "Wow!"
>>> for c in s:
... print c
...
Y
Ö
W
!
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Iterating over a file • When you iterate over a file, you get rows
fifteen
>>> for line in open("real.txt"):
... Printing line,
...
Real programmers write in FORTRAN
Maybe they do now
in this decadent age of
Beer Lite, calculator and "friendly" software
but in the good old days
when the term "software" sounded weird
and the Royal Computers consisted of drums and vacuum tubes,
Real programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not even assembly language.
machine language.
Raw, unadorned, unfathomable hex numbers.
Immediately.
Copyright (C) 2008, http://www.dabeaz.com 1-
consume iterables
• Many functions consume an "iterable" object • Shortcuts:
16
soma(s), min(s), max(s)
• Constructor list(s), Tuple(s), Set(s), Date(s)
• em Operatorelement em s
• Many others in the library
Copyright (C) 2008, http://www.dabeaz.com 1-
Iterating over a file • When you iterate over a file, you get rows
fifteen
>>> for line in open("real.txt"):
... Printing line,
...
Real programmers write in FORTRAN
Maybe they do now
in this decadent age of
Beer Lite, calculator and "friendly" software
but in the good old days
when the term "software" sounded weird
and the Royal Computers consisted of drums and vacuum tubes,
Real programmers wrote in machine code.
Not FORTRAN. Not RATFOR. Not even assembly language.
machine language.
Raw, unadorned, unfathomable hex numbers.
Immediately.
Copyright (C) 2008, http://www.dabeaz.com 1-
consume iterables
• Many functions consume an "iterable" object • Shortcuts:
16
soma(s), min(s), max(s)
• Constructor list(s), Tuple(s), Set(s), Date(s)
• em Operatorelement em s
• Many others in the library
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration Log • The reason why you can iterate over different ones
Objects is that there is a specific protocol
17
>>> Article = [1, 4, 5]
>>> it = eater(eat)
>>> it.next()
1
>>> it.next()
4
>>> it.next()
5
>>> it.next()
Traceback (last most recent call):
File "", line 1, in
StopIteration
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration Log • An internal view of the for statement
for x in obj:
# Testify
• Bottom cover_iter = iter(obj) # Get iterator object
during 1:
experiment:
x = _iter.next() # Get the next element
except StopIteration: # No more items
Pause
# Testify
...
• Any object that supports iter() and next() is considered "iterable".
18
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration Log • The reason why you can iterate over different ones
Objects is that there is a specific protocol
17
>>> Article = [1, 4, 5]
>>> it = eater(eat)
>>> it.next()
1
>>> it.next()
4
>>> it.next()
5
>>> it.next()
Traceback (last most recent call):
File "", line 1, in
StopIteration
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration Log • An internal view of the for statement
for x in obj:
# Testify
• Bottom cover_iter = iter(obj) # Get iterator object
during 1:
experiment:
x = _iter.next() # Get the next element
except StopIteration: # No more items
Pause
# Testify
...
• Any object that supports iter() and next() is considered "iterable".
18 Copyright (C) 2008, http://www.dabeaz.com 1-
support iteration
• Custom objects can support iteration • Example: Countdown...
>>> for x in countdown(10):
...x print,
...
10 9 8 7 6 5 4 3 2 1
>>>
19
• To do this, simply make the object implement__iter__() and next()
Copyright (C) 2008, http://www.dabeaz.com 1-
support iteration
Class (object) countdown:
def __init__(self,start):
self.count = iniciar
def __iter__(auto):
return to yourself
def continue(auto):
if self.count >> for x in countdown(10):
...x print,
...
10 9 8 7 6 5 4 3 2 1
>>>
19
• To do this, simply make the object implement__iter__() and next()
Copyright (C) 2008, http://www.dabeaz.com 1-
support iteration
Class (object) countdown:
def __init__(self,start):
self.count = iniciar
def __iter__(auto):
return to yourself
def continue(auto):
if self.count >> c = countdown(5)
>>> for i in c:
... I print,
...
5 4 3 2 1
>>>
21
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration comment
• There are many subtleties in designing iterators for different objects
• We won't get into that though • This isn't a tutorial about "iterators" • We're talking about generators...
22
Copyright (C) 2008, http://www.dabeaz.com 1-
iteration example
• Application example: >>> c = countdown(5)
>>> for i in c:
... I print,
...
5 4 3 2 1
>>>
21
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration comment
• There are many subtleties in designing iterators for different objects
• We won't get into that though • This isn't a tutorial about "iterators" • We're talking about generators...
22
Part 2: Generators
Monday May 16, 2011
Copyright (C) 2008, http://www.dabeaz.com 1-
The generator
• A generator is a function that produces a sequence of results instead of a single value
23
Def. Countdown(n):
as long as n > 0:
income n
n-= 1
>>> for i in countdown(5):
... I print,
...
5 4 3 2 1
>>>
• Instead of returning a value, generate a series of values (using the yield statement)
Copyright (C) 2008, http://www.dabeaz.com 1-
The generator
24
• The behavior differs greatly from the normal function • Calling a generator function produces a
generator object. However, the execution of the function does not start.
Def. Countdown(n):
print "Countdown von", n
as long as n > 0:
income n
n-= 1
>>> x = Countdown(10)
>>>x
>>>
Notice that no output was produced
Copyright (C) 2008, http://www.dabeaz.com 1-
generator functions
• The function only runs on next()>>> x =countdown(10)
>>>x
>>> x.next()
Countdown ab 10
10
>>>
• yield returns a value, but stops the function • function continues at the next next() call
>>> x.next()
9
>>> x.next()
8
>>>
The function starts executing here
25
Copyright (C) 2008, http://www.dabeaz.com 1-
generator functions
• When the generator returns, the iteration for >>>x.next()
1
>>> x.next()
Traceback (last most recent call):
File "", line 1, in ?
StopIteration
>>>
26
Copyright (C) 2008, http://www.dabeaz.com 1-
generator functions
• The function only runs on next()>>> x =countdown(10)
>>>x
>>> x.next()
Countdown ab 10
10
>>>
• yield returns a value, but stops the function • function continues at the next next() call
>>> x.next()
9
>>> x.next()
8
>>>
The function starts executing here
25
Copyright (C) 2008, http://www.dabeaz.com 1-
generator functions
• When the generator returns, the iteration for >>>x.next()
1
>>> x.next()
Traceback (last most recent call):
File "", line 1, in ?
StopIteration
>>>
26 Copyright (C) 2008, http://www.dabeaz.com 1-
generator functions
• A generator function is usually a more convenient way to write an iterator
• You don't have to worry about the iterator history (.next, .__iter__, etc.)
• It just works
27
Copyright (C) 2008, http://www.dabeaz.com 1-
Generators x Iterators
• A generator function is slightly different from an object that supports iterations
• A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.
• This differs from a list (which you can repeat as many times as you want)
28
Copyright (C) 2008, http://www.dabeaz.com 1-
generator functions
• A generator function is usually a more convenient way to write an iterator
• You don't have to worry about the iterator history (.next, .__iter__, etc.)
• It just works
27
Copyright (C) 2008, http://www.dabeaz.com 1-
Generators x Iterators
• A generator function is slightly different from an object that supports iterations
• A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.
• This differs from a list (which you can repeat as many times as you want)
28 Copyright (C) 2008, http://www.dabeaz.com 1-
Generator Expressions • A generated version of a list comprehension
>>> a = [1,2,3,4]
>>> b = (2*x for x in a)
>>> b
>>> for i in b: print b,
...
2 4 6 8
>>>
• This iterates through a sequence of elements and applies an operation to each element.
• However, the results are generated individually with a generator
29
Copyright (C) 2008, http://www.dabeaz.com 1-
generator expressions
• Important differences to a list composition. • Does not create a list. • Only useful purpose is iteration. • Once consumed, cannot be reused
30
• Example: >>> a = [1,2,3,4]
>>> b = [2*x for x in a]
>>> b
[2, 4, 6, 8]
>>> c = (2*x for x in a)
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Generator Expressions • A generated version of a list comprehension
>>> a = [1,2,3,4]
>>> b = (2*x for x in a)
>>> b
>>> for i in b: print b,
...
2 4 6 8
>>>
• This iterates through a sequence of elements and applies an operation to each element.
• However, the results are generated individually with a generator
29
Copyright (C) 2008, http://www.dabeaz.com 1-
generator expressions
• Important differences to a list composition. • Does not create a list. • Only useful purpose is iteration. • Once consumed, cannot be reused
30
• Example: >>> a = [1,2,3,4]
>>> b = [2*x for x in a]
>>> b
[2, 4, 6, 8]
>>> c = (2*x for x in a)
>>>
Copyright (C) 2008, http://www.dabeaz.com 1-
Generator Expressions • General syntax
(expression for i in s if cond1
for j at t if cond2
...
to be judged)
31
• What does i mean in s:
Condition1:
for j at t:
Condition2:
...
if conditional: output expression
Copyright (C) 2008, http://www.dabeaz.com 1-
A note on syntax
• Parentheses in a generator expression can be discarded when used as a single function argument
• Example: sum(x*x for x in s)
32
generative expression
Copyright (C) 2008, http://www.dabeaz.com 1-
Generator Expressions • General syntax
(expression for i in s if cond1
for j at t if cond2
...
to be judged)
31
• What does i mean in s:
Condition1:
for j at t:
Condition2:
...
if conditional: output expression
Copyright (C) 2008, http://www.dabeaz.com 1-
A note on syntax
• Parentheses in a generator expression can be discarded when used as a single function argument
• Example: sum(x*x for x in s)
32
generative expression
Copyright (C) 2008, http://www.dabeaz.com 1-
Interlude• Now we have two basic building blocks• Function generator:
33
Def. Countdown(n):
as long as n > 0:
income n
n-= 1
See AlsoApp Design Process: How to Create a Great Mobile App | iteratorsPart 1: Iterators - University of California, San Diego Part 1: Iterators - University of California, San Diego ... 1 - [PDF Document]• Square generator expressions = (x*x for x in s)
• In both cases we get an object that generates values (usually consumed in a for loop)
Copyright (C) 2008, http://www.dabeaz.com 1-
Part 2
34
Processing of data files
(Show me your web server logs)
Copyright (C) 2008, http://www.dabeaz.com 1-
programming problem
35
Find out how many bytes of data were transferred by summing the last column of data in this Apache web server log
81.107.39.38 - ... "GET /ply/HTTP/1.1" 200 7587
81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133
81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 20023903
81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238
81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359
66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447
Oh yes, and the log file can be huge (GB)
Copyright (C) 2008, http://www.dabeaz.com 1-
the registry file
• Each log line looks like this:
36
bytestr = line.rsplit(none, 1) [1]
81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238
• The number of bytes is the last column
• If a number or value is missing (-)81.107.39.38 - ..."GET /ply/ HTTP/1.1" 304 -
• Conversion of the value if bytestr != '-':
bytes = int(bytestr)
Copyright (C) 2008, http://www.dabeaz.com 1-
programming problem
35
Find out how many bytes of data were transferred by summing the last column of data in this Apache web server log
81.107.39.38 - ... "GET /ply/HTTP/1.1" 200 7587
81.107.39.38 - ... "GET /favicon.ico HTTP/1.1" 404 133
81.107.39.38 - ... "GET /ply/bookplug.gif HTTP/1.1" 20023903
81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238
81.107.39.38 - ... "GET /ply/example.html HTTP/1.1" 200 2359
66.249.72.134 - ... "GET /index.html HTTP/1.1" 200 4447
Oh yes, and the log file can be huge (GB)
Copyright (C) 2008, http://www.dabeaz.com 1-
the registry file
• Each log line looks like this:
36
bytestr = line.rsplit(none, 1) [1]
81.107.39.38 - ... "GET /ply/ply.html HTTP/1.1" 200 97238
• The number of bytes is the last column
• If a number or value is missing (-)81.107.39.38 - ..."GET /ply/ HTTP/1.1" 304 -
• Conversion of the value if bytestr != '-':
bytes = int(bytestr)
Copyright (C) 2008, http://www.dabeaz.com 1-
A non-generative solution
• Just run a simple for loop
37
wwwlog = open("access log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(none, 1) [1]
if bytestr != '-':
total += int(bytestr)
Print "Total", Total
• We read line by line and only update a total • But that's so 90's...
Copyright (C) 2008, http://www.dabeaz.com 1-
A generative solution
• Let's use some generator expressions
38
wwwlog = open("access log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
• wow! That is different! • Less code • Completely different programming style
Part 3: Piping
Monday May 16, 2011
Copyright (C) 2008, http://www.dabeaz.com 1-
A non-generative solution
• Just run a simple for loop
37
wwwlog = open("access log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(none, 1) [1]
if bytestr != '-':
total += int(bytestr)
Print "Total", Total
• We read line by line and only update a total • But that's so 90's...
Copyright (C) 2008, http://www.dabeaz.com 1-
A generative solution
• Let's use some generator expressions
38
wwwlog = open("access log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
• wow! That is different! • Less code • Completely different programming style
Copyright (C) 2008, http://www.dabeaz.com 1-
Generators as a pipeline
• To understand, think of the solution as a data processing pipeline
39
wwwlog bytecolumn bytes sum()access-log total
• Each step is defined by iteration/generation wwwlog =open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
Copyright (C) 2008, http://www.dabeaz.com 1-
Be Declarative • At each pipeline step, we declare a
Operation applied to the entire input stream
40
wwwlog bytecolumn bytes sum()access-log total
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
This operation is applied to each line of the log file
Copyright (C) 2008, http://www.dabeaz.com 1-
Generators as a pipeline
• To understand, think of the solution as a data processing pipeline
39
wwwlog bytecolumn bytes sum()access-log total
• Each step is defined by iteration/generation wwwlog =open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
Copyright (C) 2008, http://www.dabeaz.com 1-
Be Declarative • At each pipeline step, we declare a
Operation applied to the entire input stream
40
wwwlog bytecolumn bytes sum()access-log total
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
This operation is applied to each line of the log file
Copyright (C) 2008, http://www.dabeaz.com 1-
declarative sein
• Instead of concentrating on the problem line by line, simply break it down into large operations that affect the entire file.
• This is a "declarative" style • The key: Think big...
41
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration is the glue
42
• The glue that holds the pipeline together is the iteration that occurs at each step
wwwlog = open("access log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
• The calculation is driven by the last step • The sum() function consumes values
sent through the pipeline (through .next() calls)
Copyright (C) 2008, http://www.dabeaz.com 1-
declarative sein
• Instead of concentrating on the problem line by line, simply break it down into large operations that affect the entire file.
• This is a "declarative" style • The key: Think big...
41
Copyright (C) 2008, http://www.dabeaz.com 1-
Iteration is the glue
42
• The glue that holds the pipeline together is the iteration that occurs at each step
wwwlog = open("access log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
• The calculation is driven by the last step • The sum() function consumes values
sent through the pipeline (through .next() calls)
Copyright (C) 2008, http://www.dabeaz.com 1-
power
• Of course, this generative approach has all kinds of magic dances that are slow.
• Let's check in a 1.3GB log file...
43
% ls -l big-access-log
-rw-r--r-- beazley 1303238000 29. Februar 08:06 big-access-log
Copyright (C) 2008, http://www.dabeaz.com 1-
performance competition
44
wwwlog = open("big-access-log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(none, 1) [1]
if bytestr != '-':
total += int(bytestr)
Print "Total", Total
wwwlog = open("big-access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
27.20
25,96
Tempo
Tempo
Copyright (C) 2008, http://www.dabeaz.com 1-
power
• Of course, this generative approach has all kinds of magic dances that are slow.
• Let's check in a 1.3GB log file...
43
% ls -l big-access-log
-rw-r--r-- beazley 1303238000 29. Februar 08:06 big-access-log
Copyright (C) 2008, http://www.dabeaz.com 1-
performance competition
44
wwwlog = open("big-access-log")
total = 0
for line in wwwlog:
bytestr = line.rsplit(none, 1) [1]
if bytestr != '-':
total += int(bytestr)
Print "Total", Total
wwwlog = open("big-access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
27.20
25,96
Tempo
Tempo
Copyright (C) 2008, http://www.dabeaz.com 1-
Comment
• Not only wasn't it slow, it was 5% faster • And it had less code • And it was relatively easy to read • And honestly I like it better overall...
45
“We used to use AWK for that and we liked it. Oh yeah, and off my lawn!”
Copyright (C) 2008, http://www.dabeaz.com 1-
performance competition
46
wwwlog = open("access log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
25,96
Tempo
% awk '{ total += $NF } END { print total }' large access log
37.33
TimeNote: extracting the last
column might not be awk's forte
Copyright (C) 2008, http://www.dabeaz.com 1-
Comment
• Not only wasn't it slow, it was 5% faster • And it had less code • And it was relatively easy to read • And honestly I like it better overall...
45
“We used to use AWK for that and we liked it. Oh yeah, and off my lawn!”
Copyright (C) 2008, http://www.dabeaz.com 1-
performance competition
46
wwwlog = open("access log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
25,96
Tempo
% awk '{ total += $NF } END { print total }' large access log
37.33
TimeNote: extracting the last
column might not be awk's forte
Copyright (C) 2008, http://www.dabeaz.com 1-
food for thought
• At no point in our generation solution do we create large temporary lists
• So not only is this solution faster, but it can also be applied to huge data files
• Competitive with traditional tools
47
Copyright (C) 2008, http://www.dabeaz.com 1-
more thoughts
• The generator solution was based on the concept of data pipeline between different components
• What if you had more advanced component types to work with?
• Perhaps you can perform different types of processing simply by connecting different pipeline components together.
48
Copyright (C) 2008, http://www.dabeaz.com 1-
food for thought
• At no point in our generation solution do we create large temporary lists
• So not only is this solution faster, but it can also be applied to huge data files
• Competitive with traditional tools
47
Copyright (C) 2008, http://www.dabeaz.com 1-
more thoughts
• The generator solution was based on the concept of data pipeline between different components
• What if you had more advanced component types to work with?
• Perhaps you can perform different types of processing simply by connecting different pipeline components together.
48 Copyright (C) 2008, http://www.dabeaz.com 1-
that seems familiar to me
• The Unix philosophy • Having a collection of useful system utilities • Can link them to files or each other • Perform complex tasks by passing data
49
Copyright (C) 2008, http://www.dabeaz.com 1-
part 3
50
Fun with files and directories
Copyright (C) 2008, http://www.dabeaz.com 1-
programming problem
51
You have hundreds of web server logs scattered across multiple directories. In addition, some of the logs are compressed. Modify the last program so that you can easily read all of these logs
foo/
access-log-012007.gz
access-log-022007.gz
access-log-032007.gz
...
Access Log-012008
Barra/
access-log-092007.bz2
...
Access Log-022008
Copyright (C) 2008, http://www.dabeaz.com 1-
os.walk()
52
import-us
para path, dirlist, filelist in os.walk(groupdir):
# Path: current directory
# dirlist : list of subdirectories
# filelist : list of files
...
• A very useful feature for browsing the file system
• This uses generators to recursively traverse the filesystem
Copyright (C) 2008, http://www.dabeaz.com 1-
programming problem
51
You have hundreds of web server logs scattered across multiple directories. In addition, some of the logs are compressed. Modify the last program so that you can easily read all of these logs
foo/
access-log-012007.gz
access-log-022007.gz
access-log-032007.gz
...
Access Log-012008
Barra/
access-log-092007.bz2
...
Access Log-022008
Copyright (C) 2008, http://www.dabeaz.com 1-
os.walk()
52
import-us
para path, dirlist, filelist in os.walk(groupdir):
# Path: current directory
# dirlist : list of subdirectories
# filelist : list of files
...
• A very useful feature for browsing the file system
• This uses generators to recursively traverse the filesystem
Copyright (C) 2008, http://www.dabeaz.com 1-
think
53
import-us
import fnmatch
def gen_find(filepat,topo):
for path, dirlist, filelist in os.walk(top):
for names in fnmatch.filter(filelist,filepat):
yield os.path.join(Pfad,Name)
• Generates all filenames in a directory tree that match a specific filename pattern
• Example files = gen_find("*.py","/")
logs = gen_find("access log*","/usr/www/")
Copyright (C) 2008, http://www.dabeaz.com 1-
performance competition
54
pyfiles = gen_find("*.py","/")
for name in pyfiles:
print names
% find / -name '*.py'
559s
468s
wall clock
wall clock
Runs on a 750GB filesystem with about 140,000 .py files
Copyright (C) 2008, http://www.dabeaz.com 1-
grep
57
import right
def gen_grep(pat, Sailing):
patc = re.compile(pat)
for row on row:
if patc.search(line): Income line
• Creates a linestring containing a given regular expression
• Example: lognames = gen_find("access-log*", "/usr/www")
logfiles = gen_open(lognames)
log lines = gen_cat (log files)
patlines = gen_grep(pat, loglines)
Copyright (C) 2008, http://www.dabeaz.com 1-
example
58
• Find out how many bytes were transferred for a specific pattern in an entire log directory
pat = r"some pattern"
logdir = "/some/dir/"
filenames = gen_find("access log*", logdir)
logfiles = gen_open(filenames)
log lines = gen_cat (log files)
patlines = gen_grep(pat,loglines)
bytecolumn = (line.rsplit(None,1)[1] for line in patlines)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
Copyright (C) 2008, http://www.dabeaz.com 1-
grep
57
import right
def gen_grep(pat, Sailing):
patc = re.compile(pat)
for row on row:
if patc.search(line): Income line
• Creates a linestring containing a given regular expression
• Example: lognames = gen_find("access-log*", "/usr/www")
logfiles = gen_open(lognames)
log lines = gen_cat (log files)
patlines = gen_grep(pat, loglines)
Copyright (C) 2008, http://www.dabeaz.com 1-
example
58
• Find out how many bytes were transferred for a specific pattern in an entire log directory
pat = r"some pattern"
logdir = "/some/dir/"
filenames = gen_find("access log*", logdir)
logfiles = gen_open(filenames)
log lines = gen_cat (log files)
patlines = gen_grep(pat,loglines)
bytecolumn = (line.rsplit(None,1)[1] for line in patlines)
bytes = (int(x) for x in bytecolumn if x != '-')
print "Total", soma (bytes)
Copyright (C) 2008, http://www.dabeaz.com 1-
important concept
59
• Generators decouple iteration from code that uses the results of the iteration
• In the last example we perform a calculation on a linestring
• It doesn't matter where or how these lines are created
• So we can connect any number of components from scratch as long as they end up forming a linestring
Copyright (C) 2008, http://www.dabeaz.com 1-
Part 4
60
Analyze and process data
Part 4: Coroutines
Monday May 16, 2011
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Yield as an expression
• A small modification to the yield statement was introduced in Python 2.5 (PEP-342)
• Now you can use yield as an expression • For example on the right side of an assignment
23
def grep(pattern): druckt "Searching %s" % pattern while True:line = (yield) if pattern on line: druckt Zeile,
• Question: What is your worth?
grep.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Koroutinen
• If you use yield more generally, you get a coroutine • You do more than just generate values • Functions can consume passed values instead.
24
>>> g = grep("python") >>> g.next() # Prime it(briefly explained)search for python >>> g.send("yes but no but yes but no")> > > g.send("A series of tubes")>>> g.send("python generators rock!")pythongenerators rock!>>>
• Sent values are sent back (yield)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Yield as an expression
• A small modification to the yield statement was introduced in Python 2.5 (PEP-342)
• Now you can use yield as an expression • For example on the right side of an assignment
23
def grep(pattern): druckt "Searching %s" % pattern while True:line = (yield) if pattern on line: druckt Zeile,
• Question: What is your worth?
grep.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Koroutinen
• If you use yield more generally, you get a coroutine • You do more than just generate values • Functions can consume passed values instead.
24
>>> g = grep("python") >>> g.next() # Prime it(briefly explained)search for python >>> g.send("yes but no but yes but no")> > > g.send("A series of tubes")>>> g.send("python generators rock!")pythongenerators rock!>>>
• Submitted values are returned from (yield)Copyright (C) 2009, DavidBeazley, http://www.dabeaz.com
Coroutine Execution
• Execution is the same as for a generator • Nothing happens when you call a coroutine • They are only executed in response to next() and send()
methods
25
>>> g = grep("python")>>> g.next() Managing Python>>>
Notice that no output was produced
The first operation starts the coroutine
Run
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Coroutine Preparation • All coroutines must first be "prepared".
Calling .next() (or send(None))
• This advances execution to the position of the first yield expression.
26
.next() advances the coroutine to
first earnings expression
def grep (Standard):
Print sample % "Search for %s"
as long as true:
line = (yield) wenn default line: print line,
• At this point it is ready to receive a value
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Coroutine Execution
• Execution is the same as for a generator • Nothing happens when you call a coroutine • They are only executed in response to next() and send()
methods
25
>>> g = grep("python")>>> g.next() Managing Python>>>
Notice that no output was produced
The first operation starts the coroutine
Run
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Coroutine Preparation • All coroutines must first be "prepared".
Calling .next() (or send(None))
• This advances execution to the position of the first yield expression.
26
.next() advances the coroutine to
first earnings expression
def grep (Standard):
Print sample % "Search for %s"
as long as true:
line = (yield) wenn default line: print line,
• At this point it is ready to receive a value
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
with a decorator
• It's easy to forget to call .next(). • Solved by grouping coroutines with a decorator
27
def coroutine(func): def start(*args,**kwargs): cr =func(*args,**kwargs) cr.next() return cr return start
@coroutinedef grep (Standard): ...
• I will use this in most future examples
coroutine.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Closing a coroutine
• A coroutine can run indefinitely. • Use .close() to end them
28
>>> g = grep("python")>>> g.next() # PrimeitSearching for python>>> g.send("Yes, but no, but yes, but no")>>> g.send( "A series of pipes")>>>g.send("python generators rock!")python generatorsrock!>>> g.close()
• Note: Garbage Collection also calls close().
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
with a decorator
• It's easy to forget to call .next(). • Solved by grouping coroutines with a decorator
27
def coroutine(func): def start(*args,**kwargs): cr =func(*args,**kwargs) cr.next() return cr return start
@coroutinedef grep (Standard): ...
• I will use this in most future examples
coroutine.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Closing a coroutine
• A coroutine can run indefinitely. • Use .close() to end them
28
>>> g = grep("python")>>> g.next() # PrimeitSearching for python>>> g.send("Yes, but no, but yes, but no")>>> g.send( "A series of pipes")>>>g.send("python generators rock!")python generatorsrock!>>> g.close()
• Note: Garbage Collection also calls close().
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
capture close() • close() can be captured (GeneratorExit)
29
• You cannot ignore this exception. • Only legal action to delete and return
@coroutinedef grep(padrão): print "Procurando %s" % patterntry: while True: line = (yield) if pattern in line: print line,außer GeneratorExit: print "Indo embora. Adeus"
grepclose.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Throwing an Exception • Exceptions can be thrown from within a coroutine
30
>>> g = grep("python")>>> g.next() # PrimeitSearching for python>>> g.send("python generatorsrock!")python rock generators!>>>g.throw(RuntimeError," You are blocked") Traceback (last call): File ", line 1, in file ", line 4, in grepRuntimeError:You areblocked>>>
• The exception comes from the yield expression. • Can be caught/treated in the usual way
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
capture close() • close() can be captured (GeneratorExit)
29
• You cannot ignore this exception. • Only legal action to delete and return
@coroutinedef grep(padrão): print "Procurando %s" % patterntry: while True: line = (yield) if pattern in line: print line,außer GeneratorExit: print "Indo embora. Adeus"
grepclose.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Throwing an Exception • Exceptions can be thrown from within a coroutine
30
>>> g = grep("python")>>> g.next() # PrimeitSearching for python>>> g.send("python generatorsrock!")python rock generators!>>>g.throw(RuntimeError," You are blocked") Traceback (last call): File ", line 1, in file ", line 4, in grepRuntimeError:You areblocked>>>
• The exception comes from the yield expression. • Can be caught/treated in the usual way
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
interlude
• Despite some similarities, generators and coroutines are fundamentally two different concepts
• Generators produce values • Coroutines tend to consume values • It's easy to get distracted by methods
intended for coroutines are sometimes described as a way to tune (i.e. reset their value) generators that are in the process of generating an iteration pattern. This is mostly wrong.
31
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
A wrong example
32
def countdown(n): print "Countdown from", n while n >= 0:newvalue = (yield n) # When a new value is passed, reset n with it if newvalue is not None: n = newvalue else: n - = 1
• A “generator” that produces and receives values
• It works, but it's "distributed" and hard to understandc =countdown(5)for n in c: print n if n == 5: c.send(3)
Notice how a value is "lost" in the
Iterationsprotokoll
falsch.py
5210
exit
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
interlude
• Despite some similarities, generators and coroutines are fundamentally two different concepts
• Generators produce values • Coroutines tend to consume values • It's easy to get distracted by methods
intended for coroutines are sometimes described as a way to tune (i.e. reset their value) generators that are in the process of generating an iteration pattern. This is mostly wrong.
31
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
A wrong example
32
def countdown(n): print "Countdown from", n while n >= 0:newvalue = (yield n) # When a new value is passed, reset n with it if newvalue is not None: n = newvalue else: n - = 1
• A “generator” that produces and receives values
• It works, but it's "distributed" and hard to understandc =countdown(5)for n in c: print n if n == 5: c.send(3)
Notice how a value is "lost" in the
Iterationsprotokoll
falsch.py
5210
exit
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
keep it straight
33
• Generators produce data for iteration • Coroutines are data consumers • To keep your brain from exploding, don't mix
the two concepts together
• Coroutines have nothing to do with iteration. • Note: yield is used to produce a
Value in a coroutine but not tied to iteration.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Part 2
34
Coroutines, Pipelines and Data Flow
Part 5: General Pipelines
Monday May 16, 2011
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
process pipes
35
• Coroutines can be used to configure pipelines
coroutine coroutinesend() send() send()
• You simply chain coroutines and send data down the pipe with send() operations
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Pipeline Sources
36
• The pipeline needs an initial source (a producer)
coroutinesend() send()
Those
• An original pipeline definition task source(target): whilenot done: item = Produce_an_item() ... target.send(item) ...target.close()
• Usually no coroutine
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
pipeline sinks
37
• The pipeline must have an endpoint (sink).
coroutinesend() send()
• Collects and processes all sent data[email protected]sink(): try: while True: item = (yield) # Recebe um item
... exceto GeneratorExit: # Handle .close()
# Did ...
bathroom sink
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
An example
38
• Um eine Schriftart zu imitieren Unix 'tail -f'import timedeffollow(thefile, target): thefile.seek(0,2) # Vá para o final do arquivo while True: line = thefile.readline() if not line:time. sleep(0.1) # Durma brevemente Continue target.send(line)
• A collector who only prints the[email protected]printer(): while True: line = (yield) print line,
cofollow.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
pipeline sinks
37
• The pipeline must have an endpoint (sink).
coroutinesend() send()
• Collects and processes all sent data[email protected]sink(): try: while True: item = (yield) # Recebe um item
... exceto GeneratorExit: # Handle .close()
# Did ...
bathroom sink
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
An example
38
• Um eine Schriftart zu imitieren Unix 'tail -f'import timedeffollow(thefile, target): thefile.seek(0,2) # Vá para o final do arquivo while True: line = thefile.readline() if not line:time. sleep(0.1) # Durma brevemente Continue target.send(line)
• A collector who only prints the[email protected]printer(): while True: line = (yield) print line,
cofollow.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
An example
39
• connect them to each otherf = open("access-log")follow(f,printer())
follow()send()
Drucker()
• A photo
• Critical point: follow() does all the computation by reading lines and inserting them into the Printer() coroutine
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Pipeline-Filter
40
• Receive and send intermediate levels
coroutinesend() send()
• Usually performs some kind of data transformation, filtering, routing, etc.
@coroutinedef filter(target): while True: item = (yield) #Recebe um item
# convert/filter item... # send to next stage target.send(item)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
An example
39
• connect them to each otherf = open("access-log")follow(f,printer())
follow()send()
Drucker()
• A photo
• Critical point: follow() does all the computation by reading lines and inserting them into the Printer() coroutine
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Pipeline-Filter
40
• Receive and send intermediate levels
coroutinesend() send()
• Usually performs some kind of data transformation, filtering, routing, etc.
@coroutinedef filter(target): while True: item = (yield) #Recebe um item
# convert/filter item... # send to next stage target.send(item)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
An example filter
41
• Ein grep-Filter[email protected]grep(pattern,target):while True: line = (yield) # Receive a line if pattern in line:target.send(line) # Send to the next stage
• Conectando-o f = open("access-log")follow(f, grep('python',printer()))
follow() grep() printer() submit() submit()
• A photo
copipe.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
interlude
42
• The coroutines turn the generators
Generator input sequence
for x in s:generator generator
Quell-Coroutine coroutinesend() send()
Generators/Iteration
Koroutinen
• Key difference. The generators pull data through the iterated pipe. Coroutines use send() to send data to the pipeline.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
An example filter
41
• Ein grep-Filter[email protected]grep(pattern,target):while True: line = (yield) # Receive a line if pattern in line:target.send(line) # Send to the next stage
• Conectando-o f = open("access-log")follow(f, grep('python',printer()))
follow() grep() printer() submit() submit()
• A photo
copipe.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
interlude
42
• The coroutines turn the generators
Generator input sequence
for x in s:generator generator
Quell-Coroutine coroutinesend() send()
Generators/Iteration
Koroutinen
• Key difference. The generators pull data through the iterated pipe. Coroutines use send() to send data to the pipeline.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
be branched
43
• You can use coroutines to send data to multiple destinations
Quell-Coroutine
corrotine
send Send()
• The source simply “broadcasts” the data. The further routing of this data can be of any complexity.
corrotine
coroutinesend()
from you()
corrotine
from you()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Example: broadcasting
44
• Send to multiple[email protected]broadcast(targets):while True: item = (yield) for the target intargets:target.send(item)
• This takes a sequence of coroutines (targets) and sends received items to all.
cobroadcast.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
be branched
43
• You can use coroutines to send data to multiple destinations
Quell-Coroutine
corrotine
send Send()
• The source simply “broadcasts” the data. The further routing of this data can be of any complexity.
corrotine
coroutinesend()
from you()
corrotine
from you()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Example: broadcasting
44
• Send to multiple[email protected]broadcast(targets):while True: item = (yield) for the target intargets:target.send(item)
• This takes a sequence of coroutines (targets) and sends received items to all.
cobroadcast.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Example: broadcasting
45
• Verwendungsbeispiel: f = open("access-log")follow(f,broadcast([grep('python',printer()), grep('ply',printer()),grep('swig', Drucker ())]))
follow the broadcast
impressora() grep('python')
grep('ply')
grep('swig') printer()
Drucker()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Example: broadcasting
46
• A more disturbing variant...f = open("access-log")p =printer()follow(f, broadcast([grep('python',p), grep('ply',p),grep( ' gole',p)]))
follow the broadcast
grep('Python')
grep('ply')
grep('sip')
Drucker()
cobroadcast2.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Example: broadcasting
45
• Verwendungsbeispiel: f = open("access-log")follow(f,broadcast([grep('python',printer()), grep('ply',printer()),grep('swig', Drucker ())]))
follow the broadcast
impressora() grep('python')
grep('ply')
grep('swig') printer()
Drucker()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Example: broadcasting
46
• A more disturbing variant...f = open("access-log")p =printer()follow(f, broadcast([grep('python',p), grep('ply',p),grep( ' gole',p)]))
follow the broadcast
grep('Python')
grep('ply')
grep('sip')
Drucker()
cobroadcast2.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
interlude
47
• Coroutines provide more powerful data routing capabilities than simple iterators
• Once you have created a collection of simple data processing components, you can insert them into complex arrangements of pipes, branches, junctions, and so on.
• Although there are some limitations (later)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
a digression
48
• While preparing this tutorial, I wished the variable assignment was an expression
@coroutinedef printer(): while True: line = (yield) printline,
@coroutinedef printer(): while (line = yield): imprimir linha,
against
• However, I'm not holding my breath with that... • I'm actually waiting to be whipped with an a
Rubber chicken for this suggestion.
Part 6: Tasks
Monday May 16, 2011
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
The task concept
93
• In concurrent programming, problems are typically broken down into “tasks”.
• Tasks have some essential characteristics • Independent flow of control • Internal state • Can be scheduled (paused/resumed) • Can communicate with other tasks
• Assertion: Coroutines are tasks
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
94
• Let's look at the gist • Coroutines have their own flow of control.
@coroutinedef grep(padrão): print "Procurando %s" % patternwhile True: line = (yield) if pattern in line: print line,
Testify
• A coroutine is just a sequence of statements like any other Python function
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
The task concept
93
• In concurrent programming, problems are typically broken down into “tasks”.
• Tasks have some essential characteristics • Independent flow of control • Internal state • Can be scheduled (paused/resumed) • Can communicate with other tasks
• Assertion: Coroutines are tasks
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
94
• Let's look at the gist • Coroutines have their own flow of control.
@coroutinedef grep(padrão): print "Procurando %s" % patternwhile True: line = (yield) if pattern in line: print line,
Testify
• A coroutine is just a sequence of statements like any other Python function
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
95
• Coroutines have their own internal state • For example: local variables
@coroutinedef grep(padrão): print "Procurando %s" % patternwhile True: line = (yield) if pattern in line: print line,
Locations
• Places live as long as the coroutine is active. • You create an execution environment
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
96
• Coroutines can communicate • The .send() method sends data to a coroutine
@coroutinedef grep(padrão): print "Procurando %s" % patternwhile True: line = (yield) if pattern in line: print line,
• yield expressions take input
Send Message)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
95
• Coroutines have their own internal state • For example: local variables
@coroutinedef grep(padrão): print "Procurando %s" % patternwhile True: line = (yield) if pattern in line: print line,
Locations
• Places live as long as the coroutine is active. • You create an execution environment
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
96
• Coroutines can communicate • The .send() method sends data to a coroutine
@coroutinedef grep(padrão): print "Procurando %s" % patternwhile True: line = (yield) if pattern in line: print line,
• yield expressions take input
Send Message)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
97
• Coroutines can be paused and resumed. • Stop execution. • continue send() execution. • close() Terminate execution
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
I am positive
98
• Sure, coroutines look like tasks • But they aren't tied to threads • or subprocesses • One question: Can you multitask?
without using any of these concepts?
• Multitasking with only coroutines?
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Are coroutines tasks?
97
• Coroutines can be paused and resumed. • Stop execution. • continue send() execution. • close() Terminate execution
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
I am positive
98
• Sure, coroutines look like tasks • But they aren't tied to threads • or subprocesses • One question: Can you multitask?
without using any of these concepts?
• Multitasking with only coroutines?
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
part 6
99
A crash course in operating systems
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
program execution
100
• Auf einer CPU ist ein Programm eine Reihe von main_instructions: pushl%ebp movl %esp, %ebp subl $24, %esp movl $0, -12(%ebp) movl $0,-16(%ebp) jmp L2L3: movl - 16(%ebp), %eax loyal -12(%ebp), %edx addl%eax, (%edx) loyal -16(%ebp), %eax incl (%eax)L2: cmpl $9, -16(% ebp)jle L3-Exit ret
int main () { int ich, insgesamt = 0; para (i = 0; i < 10; i++) {total += i; }}
• When running, there's no point in doing more than one thing at a time (or any kind of task switching)
cc
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Das Multitasking-Problem
101
• CPUs don't know about multitasking • Neither do application programs • Well, something has to know about it! • Note: It is the operating system
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
operating systems
102
• As you probably know, the operating system (eg Linux, Windows) is responsible for running programs on your computer.
• And as you have noticed, the operating system allows more than one process to run at the same time (eg multitasking).
• He does this by quickly switching between tasks • Question: How does he do it?
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Das Multitasking-Problem
101
• CPUs don't know about multitasking • Neither do application programs • Well, something has to know about it! • Note: It is the operating system
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
operating systems
102
• As you probably know, the operating system (eg Linux, Windows) is responsible for running programs on your computer.
• And as you have noticed, the operating system allows more than one process to run at the same time (eg multitasking).
• He does this by quickly switching between tasks • Question: How does he do it?
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
A mystery
103
• When a CPU runs its program, it does not run the operating system.
• Question: How does the operating system (which is not running) get an application (which is running) to switch to another task?
• The problem of "context switching"...
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Interruptions and traps
104
• Typically, there are only two mechanisms an operating system uses to gain control
• Interrupts - Some types of hardware-related signals (data received, timer, keypress, etc.)
• Traps - A software generated signal • In both cases the CPU briefly interrupts what is
Build and run code that is part of the operating system
• At this point, the operating system can switch tasks
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
A mystery
103
• When a CPU runs its program, it does not run the operating system.
• Question: How does the operating system (which is not running) get an application (which is running) to switch to another task?
• The problem of "context switching"...
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Interruptions and traps
104
• Typically, there are only two mechanisms an operating system uses to gain control
• Interrupts - Some types of hardware-related signals (data received, timer, keypress, etc.)
• Traps - A software generated signal • In both cases the CPU briefly interrupts what is
Build and run code that is part of the operating system
• At this point, the operating system can switch tasks
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
traps and system calls
105
• Low-level system calls are actually traps. • It is a special CPU instruction
read(fd,buf,nbytes) read: push %ebx mov 0x10(%esp),%edx mov0xc(%esp),%ecx mov 0x8(%esp),%ebx mov $0x3,%eax int $0x80
pop %ebx...
trap• If a trap statement
is running, the program halts execution at that point
• And the operating system takes over
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
High level overview
106
• Traps ensure that an operating system works. • The operating system stores your program on the CPU. • It runs until it encounters a trap (system call). • The program is stopped and the operating system is running. • Repeat
run run run run
trap trap trap
operating system is running
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
traps and system calls
105
• Low-level system calls are actually traps. • It is a special CPU instruction
read(fd,buf,nbytes) read: push %ebx mov 0x10(%esp),%edx mov0xc(%esp),%ecx mov 0x8(%esp),%ebx mov $0x3,%eax int $0x80
pop %ebx...
trap• If a trap statement
is running, the program halts execution at that point
• And the operating system takes over
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
High level overview
106
• Traps ensure that an operating system works. • The operating system stores your program on the CPU. • It runs until it encounters a trap (system call). • The program is stopped and the operating system is running. • Repeat
run run run run
trap trap trap
operating system is running
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
change of duties
107
• The following usually happens when an operating system is multitasking.
according to
catch
according to
catch
according to
catch
according to
catch
catch
ExecuteTask A:
Task B:
task switch
• In each case the system switches to another task (circling between them)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
task scheduling
108
• To run many tasks, add multiple queues
task task task
row soon
task task
CPU
race
task task
task
task task task
QueuesTraps
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
change of duties
107
• The following usually happens when an operating system is multitasking.
according to
catch
according to
catch
according to
catch
according to
catch
catch
ExecuteTask A:
Task B:
task switch
• In each case the system switches to another task (circling between them)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
task scheduling
108
• To run many tasks, add multiple queues
task task task
row soon
task task
CPU
race
task task
task
task task task
QueuesTraps
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
insight
109
• The yield statement is a kind of "trap"• No, really!• When a generating function reaches a "yield".
statement, execution is immediately suspended
• Control is returned to the code that runs the (invisible) generator function
• If you treat throughput as a trap, you can create a multitasking “operating system”—all in Python!
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
part 7
110
Let's build an operating system (you might want to put on your 5-point seat belt)
Part 7: The Mini OS
Monday May 16, 2011
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Our challenge
111
• Build a multitasking “operating system” • Use only pure Python code • No threads • No subprocesses • Use generators/coroutines
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
some motivation
112
• Recently there has been a lot of interest in alternatives to threads (particularly due to the GIL)
• Non-blocking and asynchronous I/O • Example: Servers that can support
Thousands of simultaneous client connections
• Much work has focused on event-driven systems or the "reactor model" (e.g. Twisted)
• Coroutines are a whole different touch...
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Our challenge
111
• Build a multitasking “operating system” • Use only pure Python code • No threads • No subprocesses • Use generators/coroutines
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
some motivation
112
• Recently there has been a lot of interest in alternatives to threads (particularly due to the GIL)
• Non-blocking and asynchronous I/O • Example: Servers that can support
Thousands of simultaneous client connections
• Much work has focused on event-driven systems or the "reactor model" (e.g. Twisted)
• Coroutines are a whole different touch...
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 1: Define tasks
113
• Uma tarefa objectclass Task(object): taskid = 0 def__init__(self,target): Task.taskid += 1 self.tid = Task.taskid #Task ID self.target = target # Corrotina de destino self.sendval = None# Wert für enviar def run(self): returnself.target.send(self.sendval)
• A task is a wrapper around a coroutine • There is only one operation: run()
pyos1.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
task example
114
• Here's how this wrapper behaves# A very simple generatordeffoo(): print "Part 1" yield print "Part 2" yield
>>> t1 = Task(foo()) # Insert a task >>>t1.run()Part 1>>> t1.run()Part 2>>>
• run() runs the task at the next throughput (a trap).
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 1: Define tasks
113
• Uma tarefa objectclass Task(object): taskid = 0 def__init__(self,target): Task.taskid += 1 self.tid = Task.taskid #Task ID self.target = target # Corrotina de destino self.sendval = None# Wert für enviar def run(self): returnself.target.send(self.sendval)
• A task is a wrapper around a coroutine • There is only one operation: run()
pyos1.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
task example
114
• Here's how this wrapper behaves# A very simple generatordeffoo(): print "Part 1" yield print "Part 2" yield
>>> t1 = Task(foo()) # Insert a task >>>t1.run()Part 1>>> t1.run()Part 2>>>
• run() runs the task at the next throughput (a trap).
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
115
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
pyos2.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
116
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
A queue of tasks that are ready to run
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
115
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
pyos2.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
116
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
A queue of tasks that are ready to run
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
117
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Task(target)
self.taskmap[newtask.tid] = novatarefa
self.schedule(novatarefa)
returns newtask.time
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
Presents the planner with a new task
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
118
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask
self.schedule(novatarefa) returns novatarefa.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
A dictionary that keeps track of all active tasks (each task has a unique integer task ID)
(later)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
117
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Task(target)
self.taskmap[newtask.tid] = novatarefa
self.schedule(novatarefa)
returns newtask.time
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
Presents the planner with a new task
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
118
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask
self.schedule(novatarefa) returns novatarefa.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
A dictionary that keeps track of all active tasks (each task has a unique integer task ID)
(later)
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
119
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
Put a task in the ready queue. This makes it available
to run.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
120
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap:
tarefa = self.ready.get()
result = task.run()
self. schedule (tarefa)
The main scheduler loop. It pulls jobs from the queue and runs them
the next yield.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
119
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap: task = self.ready.get()result = task.run() self.schedule(task)
Put a task in the ready queue. This makes it available
to run.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 2: The Planner
120
class Scheduler(objeto): def __init__(self): self.ready =Queue() self.taskmap = {}
def new(self,target): newtask = Aufgabe(target)self.taskmap[newtask.tid] = newtask self.schedule(newtask) returnnewtask.tid
def schedule(auto, task): self.ready.put(task)
def mainloop(self): while self.taskmap:
tarefa = self.ready.get()
result = task.run()
self. schedule (tarefa)
The main scheduler loop. It pulls jobs from the queue and runs them
the next yield.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
erste Multitasking
121
• Duas tarefas: def foo(): while True: imprime "I'm foo" yield
def bar(): while True: imprime "I'm bar" yield
• Executá-los no schedulersched =Scheduler()sched.new(foo())sched.new(bar())sched.mainloop()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
erste Multitasking
122
• Example output: I'm fooI'm barI'm fooI'm barI'm fooI'm bar
• Emphasize: throughput is a trap. • Each task runs until it reaches throughput. • At this point, the scheduler takes control again
and switch to the other task
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
erste Multitasking
121
• Duas tarefas: def foo(): while True: imprime "I'm foo" yield
def bar(): while True: imprime "I'm bar" yield
• Executá-los no schedulersched =Scheduler()sched.new(foo())sched.new(bar())sched.mainloop()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
erste Multitasking
122
• Example output: I'm fooI'm barI'm fooI'm barI'm fooI'm bar
• Emphasize: throughput is a trap. • Each task runs until it reaches throughput. • At this point, the scheduler takes control again
and switch to the other task
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Problem: task termination
123
• Scheduler crashes when a task returns def foo(): for i inxrange(10): print "I'm foo" yield...I'm fooI'm barI'm fooI'mbarTraceback (last call): file "crash.py", line 20, insched.mainloop() file "scheduler.py", line 26, in mainloop result =task.run() file "task.py", line 13, in returnself.target.send ( self.sendval)StopIteration
taskcrash.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 3: Complete the task
124
class Scheduler(objeto): ... def exit(self,task): print "Task %dterminated" % task.tid del self.taskmap[task.tid] ... defmainloop(self): while self.taskmap: tarefa = self.ready.get() try:result = task.run() exceto StopIteration: self.exit(tarefa) Continueself.schedule(tarefa)
pyos3.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Problem: task termination
123
• Scheduler crashes when a task returns def foo(): for i inxrange(10): print "I'm foo" yield...I'm fooI'm barI'm fooI'mbarTraceback (last call): file "crash.py", line 20, insched.mainloop() file "scheduler.py", line 26, in mainloop result =task.run() file "task.py", line 13, in returnself.target.send ( self.sendval)StopIteration
taskcrash.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 3: Complete the task
124
class Scheduler(objeto): ... def exit(self,task): print "Task %dterminated" % task.tid del self.taskmap[task.tid] ... defmainloop(self): while self.taskmap: tarefa = self.ready.get() try:result = task.run() exceto StopIteration: self.exit(tarefa) Continueself.schedule(tarefa)
pyos3.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 3: Complete the task
125
class Scheduler(objeto): ... def exit(self,task): print "Task %dterminated" % task.tid
do self.taskmap[task.tid]
... def mainloop(self): while self.taskmap: task =self.ready.get() try: result = task.run() außer StopIteration:self.exit(task) Continue self.schedule(task)
Remove task from scheduler
task card
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 3: Complete the task
126
class Scheduler(objeto): ... def exit(self,task): print "Task %dterminated" % task.tid del self.taskmap[task.tid] ... defmainloop(self): while self.taskmap: tarefa = self.ready.get() try:resultado = task.run()
except StopIteration:
self.exit (tarefa)
Continue
self. schedule (tarefa)
Capture task output and cleanup
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 3: Complete the task
125
class Scheduler(objeto): ... def exit(self,task): print "Task %dterminated" % task.tid
do self.taskmap[task.tid]
... def mainloop(self): while self.taskmap: task =self.ready.get() try: result = task.run() außer StopIteration:self.exit(task) Continue self.schedule(task)
Remove task from scheduler
task card
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 3: Complete the task
126
class Scheduler(objeto): ... def exit(self,task): print "Task %dterminated" % task.tid del self.taskmap[task.tid] ... defmainloop(self): while self.taskmap: tarefa = self.ready.get() try:resultado = task.run()
except StopIteration:
self.exit (tarefa)
Continue
self. schedule (tarefa)
Capture task output and cleanup
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
second multitasking
127
• Secondly: def foo(): for i in xrange(10): give "I'm foo" yield out
def bar(): for i in xrange(5): print "I'm bar" yield
sched =Scheduler()sched.new(foo())sched.new(bar())sched.mainloop()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
second multitasking
128
• Example for SaidaI'm fooI'm barI'm fooI'm barI'm fooI'm barI'm fooI'm barI'm fooI'm barI'm fooTask 2 terminadoI'm fooI'm fooI'm fooI'm fooTask 1 terminal
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
second multitasking
127
• Secondly: def foo(): for i in xrange(10): give "I'm foo" yield out
def bar(): for i in xrange(5): print "I'm bar" yield
sched =Scheduler()sched.new(foo())sched.new(bar())sched.mainloop()
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
second multitasking
128
• Example for SaidaI'm fooI'm barI'm fooI'm barI'm fooI'm barI'm fooI'm barI'm fooI'm barI'm fooTask 2 terminadoI'm fooI'm fooI'm fooI'm fooTask 1 terminal
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
system calls
129
• In a real operating system, the pitfalls are how application programs request operating system services (system calls).
• In our code, the scheduler is the operating system and the theyield statement is a trap
• To request service from the scheduler, tasks use the single-value theyield statement
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 4: System Calls
130
class SystemCall(objeto): def handle(self): pass
class Scheduler(objeto): ... def mainloop(self): whileself.taskmap: task = self.ready.get() try: result = task.run() ifisinstance(result,SystemCall): result.task = task result .sched =self result.handle() Continue exceto StopIteration: self.exit(tarefa)continue self.schedule(tarefa)
pyos4.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
system calls
129
• In a real operating system, the pitfalls are how application programs request operating system services (system calls).
• In our code, the scheduler is the operating system and the theyield statement is a trap
• To request service from the scheduler, tasks use the single-value theyield statement
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 4: System Calls
130
class SystemCall(objeto): def handle(self): pass
class Scheduler(objeto): ... def mainloop(self): whileself.taskmap: task = self.ready.get() try: result = task.run() ifisinstance(result,SystemCall): result.task = task result .sched =self result.handle() Continue exceto StopIteration: self.exit(tarefa)continue self.schedule(tarefa)
pyos4.py
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 4: System Calls
131
Class SystemCall(Object):
Identifier def(self):
happen
class Scheduler(objeto): ... def mainloop(self): whileself.taskmap: task = self.ready.get() try: result = task.run() ifisinstance(result,SystemCall): result.task = task result .sched =self result.handle() Continue exceto StopIteration: self.exit(tarefa)continue self.schedule(tarefa)
Base class for system calls. All system operations
is implemented by inheriting from this class.
Copyright (C) 2009, David Beazley, http://www.dabeaz.com
Step 4: System Calls
132
class SystemCall(objeto): def handle(self): pass
Class Scheduler (object
Part 1: Iterators - University of California, San Diego Part 1: Iterators - University of California, San Diego ... 1 - [PDF-Document] (2023)
Top Articles
The 12 most common problems and solutions for macOS Monterey
Canon EOS R10 vs M50 Mark II – Top 10 Differences – Mirrorless Comparison
Top 7 SEO Reporting Tools to Boost Your Traffic - Hancox Hub
Phrase Surfer Vs SEO: Which Optimization Tool Is Better?
Top 9 SERP Crawling Tools Reviewed
All Suits: Best Suits and How to Unlock | Spider-Man Remastered|Game8
Every Spider-Man Suit Peter Parker Has In Into The Spider-Verse
Latest Posts
US penny values | discover your value
Dice Odds Calculator - Dice Odds and Probabilities
55" The Terrace QLED 4K Outdoor TV 55LST7T (2022) | Televisions | Samsung Germany
Buy The Terrace 65 Inch TV | QE65LST7TCUXXU | Samsung UK
65" The Terrace QLED 4K Outdoor TV 65LST7T (2022) | Televisions | Samsung Germany
Article information
Author: Twana Towne Ret
Last Updated: 02/06/2023
Views: 5883
Rating: 4.3 / 5 (64 voted)
Reviews: 87% of readers found this page helpful
Author information
Name: Twana Towne Ret
Birthday: 1994-03-19
Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618
Phone: +5958753152963
Job: National Specialist
Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking
Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.