From 03cd16eadc6d7225f8e0e98e661b4bb13eede8b7 Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Mon, 13 Apr 2009 22:46:47 -0400 Subject: [PATCH] several more sections in advanced-iterators chapter --- advanced-iterators.html | 118 +++++++++++++++++++++++++++++----------- 1 file changed, 86 insertions(+), 32 deletions(-) diff --git a/advanced-iterators.html b/advanced-iterators.html index 70885a0..3651184 100644 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -18,8 +18,6 @@ body{counter-reset:h1 6}

Diving In

FIXME -

original recipe by Raymond Hettinger, ported to Python 3 and used as the basis for this chapter with his permission. -

[download alphametics.py]

import re
 import itertools
@@ -70,7 +68,7 @@ if __name__ == '__main__':
 
 

Finding the unique items in a sequence

-

This section has nothing to do with iterators, but it's put to good use in the alphametics solver. Set comprehensions make it trivial to find the unique items in a sequence. +

This section has nothing to do with iterators, but it's put to good use in the alphametics solver. Set comprehensions make it trivial to find the unique items in a sequence. [FIXME-not sure if I'm going to cover set comprehensions in an earlier chapter; if not, this is certainly an abrupt and inadequate introduction to the topic.]

 >>> a_list = ['a', 'c', 'b', 'a', 'd', 'b']
@@ -207,37 +205,51 @@ StopIteration
 
  • Since the permutations() function always returns an iterator, an easy way to debug permutations is to pass that iterator to the built-in list() function to see all the permutations immediately. -

    Other Fun Stuff in the itertools Module

    +

    Other Fun Stuff in the itertools Module

     >>> import itertools
    ->>> list(itertools.product('ABC', '123'))
    +>>> list(itertools.product('ABC', '123'))   
     [('A', '1'), ('A', '2'), ('A', '3'), 
      ('B', '1'), ('B', '2'), ('B', '3'), 
      ('C', '1'), ('C', '2'), ('C', '3')]
    ->>> list(itertools.combinations('ABC', 2))
    +>>> list(itertools.combinations('ABC', 2))  
     [('A', 'B'), ('A', 'C'), ('B', 'C')]
    +
      +
    1. The itertools.product() function returns an iterator containing the Cartesian product of two sequences. +
    2. The itertools.combinations() function returns an iterator containing all the possible combinations of the given sequence of the given length. This is like the itertools.permutations() function, except combinations don't include items that are duplicates of other items in a different order. So itertools.permutations('ABC', 2) will return both ('A', 'B') and ('B', 'A') (among others), but itertools.combinations('ABC', 2) will not return ('B', 'A') because it is a duplicate of ('A', 'B') in a different order. +
    -

    FIXME - +

    [download favorite-people.txt]

    ->>> names = list(open('examples/favorite-people.txt'))
    +>>> names = list(open('examples/favorite-people.txt'))  
     >>> names
     ['Dora\n', 'Ethan\n', 'Wesley\n', 'John\n', 'Anne\n',
     'Mike\n', 'Chris\n', 'Sarah\n', 'Alex\n', 'Lizzie\n']
    ->>> names = [name.strip() for name in names]
    +>>> names = [name.rstrip() for name in names]           
     >>> names
     ['Dora', 'Ethan', 'Wesley', 'John', 'Anne',
     'Mike', 'Chris', 'Sarah', 'Alex', 'Lizzie']
    ->>> names = sorted(names)
    +>>> names = sorted(names)                               
     >>> names
     ['Alex', 'Anne', 'Chris', 'Dora', 'Ethan',
     'John', 'Lizzie', 'Mike', 'Sarah', 'Wesley']
    ->>> names = sorted(names, key=len)
    +>>> names = sorted(names, key=len)                      
     >>> names
     ['Alex', 'Anne', 'Dora', 'John', 'Mike',
    -'Chris', 'Ethan', 'Sarah', 'Lizzie', 'Wesley']
    +'Chris', 'Ethan', 'Sarah', 'Lizzie', 'Wesley']
    +
      +
    1. This idiom returns a list of the lines in a text file. +
    2. Unfortunately (for this example), the list(open(filename)) idiom also includes the carriage returns at the end of each line. This list comprehension uses the rstrip() string method to strip trailing whitespace from each line. +
    3. The sorted() function takes a list and returns it sorted. By default, it sorts alphabetically. +
    4. But the sorted() function can also take a function as the key parameter, and it sorts by that key. In this case, the sort function is len(), so it sorts by len(each item). Shorter names come first, then longer, then longest. +
    + +

    What does this have to do with the itertools module? I'm glad you asked. + +

    +

    …continuing from the previous interactive shell… >>> import itertools ->>> groups = itertools.groupby(names, len) +>>> groups = itertools.groupby(names, len) >>> groups <itertools.groupby object at 0x00BB20C0> >>> list(groups) @@ -245,7 +257,7 @@ StopIteration (5, <itertools._grouper object at 0x00BB4050>), (6, <itertools._grouper object at 0x00BB4030>)] >>> groups = itertools.groupby(names, len) ->>> for name_length, name_iter in groups: +>>> for name_length, name_iter in groups: ... print('Names with {0:d} letters:'.format(name_length)) ... for name in name_iter: ... print(name) @@ -263,40 +275,60 @@ Sarah Names with 6 letters: Lizzie Wesley

    - -

    FIXME - -

    Combining Iterators

    - -

    FIXME - +

      +
    1. The itertools.groupby() function takes a sequence and a key function, and returns an iterator that generates pairs. Each pair contains the result of key_function(each item) and another iterator containing all the items that shared that key result. +
    2. In this example, given a list of names sorted by length, itertools.groupby(names, len) will put all the 4-letter names in one iterator, all the 5-letter names in another iterator, and so on. The groupby() function is completely generic; it could group strings by first letter, numbers by their number of factors, or any other key function you can think of. +
    + +

    Are you watching closely?

     >>> list(range(0, 3))
     [0, 1, 2]
     >>> list(range(10, 13))
     [10, 11, 12]
    ->>> list(itertools.chain(range(0, 3), range(10, 13)))
    +>>> list(itertools.chain(range(0, 3), range(10, 13)))        
     [0, 1, 2, 10, 11, 12]
    ->>> list(zip(range(0, 3), range(10, 13)))
    +>>> list(zip(range(0, 3), range(10, 13)))                    
     [(0, 10), (1, 11), (2, 12)]
    ->>> list(zip(range(0, 3), range(10, 14)))
    +>>> list(zip(range(0, 3), range(10, 14)))                    
     [(0, 10), (1, 11), (2, 12)]
    ->>> list(itertools.zip_longest(range(0, 3), range(10, 14)))
    +>>> list(itertools.zip_longest(range(0, 3), range(10, 14)))  
     [(0, 10), (1, 11), (2, 12), (None, 13)]
    +
      +
    1. The itertools.chain() function takes two iterators and returns an iterator that contains all the items from the first iterator, followed by all the items from the second iterator. (Actually, it can take any number of iterators, and it chains them all in the order they were passed to the function.) +
    2. The zip() function does something prosaic that turns out to be extremely useful: it any number of sequences and returns an iterator with the first items of each sequence, then the second items of each, then the third, and so on. +
    3. The zip() function stops at the end of the shortest sequence. range(10, 14) has 4 items (10, 11, 12, and 13), but range(0, 3) only has 3, so the zip() function returns an iterator of 3 items. +
    4. On the other hand, the itertools.zip_longest() function stops at the end of the longest sequence, inserting None values for items past the end of the shorter sequences. +
    -

    FIXME +

    OK, that was all very interesting, but how does it relate to the alphametics solver? Here's how:

     >>> characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')
     >>> guess = ('1', '2', '0', '3', '4', '5', '6', '7')
    ->>> tuple(zip(characters, guess))
    +>>> tuple(zip(characters, guess))  
     (('S', '1'), ('M', '2'), ('E', '0'), ('D', '3'),
      ('O', '4'), ('N', '5'), ('R', '6'), ('Y', '7'))
    ->>> dict(zip(characters, guess))
    +>>> dict(zip(characters, guess))   
     {'E': '0', 'D': '3', 'M': '2', 'O': '4',
      'N': '5', 'S': '1', 'R': '6', 'Y': '7'}
    +
      +
    1. Given a list of letters and a list of digits (each represented here as 1-character strings), the zip function will create a pairing of letters and digits, in order. +
    2. Why is that cool? Because that data structure happens to be exactly the right structure to pass to the dict() function to create a dictionary that uses letters as keys and their associated digits as values. Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no "order" per se), you can see that each letter is associated with the digit, based on the ordering of the original characters and guess sequences. +
    -

    A New Kind Of String Manipulation

    +

    The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution. + +

    characters = tuple(ord(c) for c in sorted_characters)
    +digits = tuple(ord(c) for c in '0123456789')
    +...
    +for guess in itertools.permutations(digits, len(characters)):
    +    ...
    +    equation = puzzle.translate(dict(zip(characters, guess)))
    + +

    But what is this translate() method? Ah, now you're getting to the really fun part. + +

    A New Kind Of String Manipulation

     >>> characters = tuple(ord(c) for c in 'SMEDONRY')
    @@ -329,11 +361,33 @@ Wesley

    Putting It All Together

    -

    FIXME +

    To recap: the solver solves alphametic puzzles by brute force, i.e. through an exhaustive search of all possible solutions. To do this, it… + +

      +
    1. Finds all the letters in the puzzle with the re.findall() function +
    2. Find all the unique letters in the puzzle with set comprehensions +
    3. Checks if there are more than 10 unique letters (meaning the puzzle is definitely unsolvable) with an assert statement +
    4. FIXME sorts the letters with a set difference operation +
    5. Converts the letters to their ASCII equivalents with a generator object +
    6. Calculates all the possible solutions with the itertools.permutations() function +
    7. Converts each possible solution to a Python expression with the translate() string method +
    8. Tests each possible solution by evaluating the Python expression with the eval() function +
    9. Returns the first solution that evaluates to True +
    + +

    …in 14 lines of code.

    Further Reading

    -

    FIXME +

    + +

    Many, many thanks to Raymond Hettinger for agreeing to relicense his code so I could port it to Python 3 and use it as the basis for this chapter.

    © 2001–9 Mark Pilgrim