From 8e831ca2ebd52acd3529732ae836c4d5c6a767f3 Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Wed, 6 May 2009 19:26:43 -0400 Subject: [PATCH] several more sections of advanced-iterators --- advanced-iterators.html | 184 ++++++++++++++++++++++++++++++++++------ 1 file changed, 160 insertions(+), 24 deletions(-) diff --git a/advanced-iterators.html b/advanced-iterators.html index 3d5027d..b7149e4 100644 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -330,7 +330,7 @@ Wesley
  • On the other hand, the itertools.zip_longest() function stops at the end of the longest sequence, inserting None values for items past the end of the shorter sequences. -

    OK, that was all very interesting, but how does it relate to the alphametics solver? Here’s how: +

    OK, that was all very interesting, but how does it relate to the alphametics solver? Here’s how:

     >>> characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')
    @@ -346,7 +346,7 @@ Wesley
  • Why is that cool? Because that data structure happens to be exactly the right structure to pass to the dict() function to create a dictionary that uses letters as keys and their associated digits as values. Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no “order” per se), you can see that each letter is associated with the digit, based on the ordering of the original characters and guess sequences. -

    The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution. +

    The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution.

    characters = tuple(ord(c) for c in sorted_characters)
     digits = tuple(ord(c) for c in '0123456789')
    @@ -359,36 +359,173 @@ for guess in itertools.permutations(digits, len(characters)):
     
     

    A New Kind Of String Manipulation

    -

    FIXME +

    Python strings have many methods. You learned about some of those methods in the Strings chapter: lower(), count(), and format(). Now I want to introduce you to a powerful but little-known string manipulation technique: the translate() method.

    ->>> characters = tuple(ord(c) for c in 'SMEDONRY')
    +>>> translation_table = {ord("A"): ord("O")}  
    +>>> translation_table                         
    +{65: 79}
    +>>> 'MARK'.translate(translation_table)       
    +'MORK'
    +
      +
    1. String translation starts with a translation table, which is just a dictionary that maps one character to another. Actually, “character” is incorrect — the translation table really maps one byte to another. +
    2. Remember, bytes in Python 3 are integers. The ord() function returns the ASCII value of a character, which, in the case of A–Z, is always a byte from 65 to 90. +
    3. The translate() method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, “translating” MARK to MORK. +
    + +

    What does this have to do with solving alphametic puzzles? As it turns out, everything. + +

    +>>> characters = tuple(ord(c) for c in 'SMEDONRY')       
     >>> characters
     (83, 77, 69, 68, 79, 78, 82, 89)
    ->>> digits = tuple(ord(c) for c in '0123456789')
    ->>> digits
    -(48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
    ->>> guess = (49, 50, 48, 51, 52, 53, 54, 55)
    ->>> translation_table = dict(zip(characters, guess))
    +>>> guess = tuple(ord(c) for c in '91570682')            
    +>>> guess
    +(57, 49, 53, 55, 48, 54, 56, 50)
    +>>> translation_table = dict(zip(characters, guess))     
     >>> translation_table
    -{68: 51, 69: 48, 77: 50, 78: 53, 79: 52, 82: 54, 83: 49, 89: 55}
    ->>> "SEND + MORE == MONEY".translate(translation_table)
    -'1053 + 2460 == 24507'
    +{68: 55, 69: 53, 77: 49, 78: 54, 79: 48, 82: 56, 83: 57, 89: 50} +>>> "SEND + MORE == MONEY".translate(translation_table) +'9567 + 1085 == 10652'
    +
      +
    1. Using a generator expression, we quickly compute the byte values for each character in a string. characters is an example of the value of sorted_characters in the alphametics.solve() function. +
    2. Using another generator expression, we quickly compute the byte values for each digit in this string. The result, guess, is of the form returned by the itertools.permutations() function in the alphametics.solve() function. +
    3. This translation table is generated by zipping characters and guess together and building a dictionary from the resulting sequence of pairs. This is exactly what the alphametics.solve() function does inside the for loop. +
    4. Finally, we pass this translation table to the translate() method of the original puzzle string. This converts each letter in the string to the corresponding digit (based on the letters in characters and the digits in guess). The result is a valid Python expression, as a string. +
    -

    FIXME - -

    ->>> translation_table = {ord("A"): ord("O")}
    ->>> translation_table
    -{65: 79}
    ->>> 'MARK'.translate(translation_table)
    -'MORK'
    - -

    FIXME +

    That’s pretty impressive. But what can you do with a string that happens to be a valid Python expression?

    Evaluating Arbitrary Strings As Python Expressions

    -

    FIXME +

    This is the final piece of the puzzle (or rather, the final piece of the puzzle solver). After all that fancy string manipulation, we’re left with a string like '9567 + 1085 == 10652'. But that’s a string, and what good is a string? Enter eval(), the universal Python evaluation tool. + +

    +>>> eval('1 + 1 == 2')
    +True
    +>>> eval('1 + 1 == 3')
    +False
    +>>> eval('9567 + 1085 == 10652')
    +True
    + +

    But wait, there’s more! The eval() function isn’t limited to boolean expressions. It can handle any Python expression and returns any datatype. + +

    +>>> eval('"A" + "B"')
    +'AB'
    +>>> eval('"MARK".translate({65: 79})')
    +'MORK'
    +>>> eval('"AAAAA".count("A")')
    +5
    +>>> eval('["*"] * 5')
    +['*', '*', '*', '*', '*']
    + +

    But wait, that’s not all! + +

    +>>> x = 5
    +>>> eval("x * 5")         
    +25
    +>>> eval("pow(x, 2)")     
    +25
    +>>> import math
    +>>> eval("math.sqrt(x)")  
    +2.2360679774997898
    +
      +
    1. The expression that eval() takes can reference global variables defined outside the eval(). If called within a function, it can reference local variables too. +
    2. And functions. +
    3. And modules. +
    + +

    Hey, wait a minute… + +

    +>>> import subprocess
    +>>> eval("subprocess.getoutput('ls ~')")      
    +'Desktop         Library         Pictures \
    + Documents       Movies          Public   \
    + Music           Sites'
    +>>> eval("subprocess.getoutput('rm -rf /')")  
    +
      +
    1. The subprocess module allows you to run arbitrary shell commands and get the result as a Python string. +
    2. Don’t do this. +
    + +

    It’s even worse than that, because there’s a global __import__() function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of eval(), you can construct a single expression that will wipe out all your files: + +

    +>>> eval("__import__('subprocess').getoutput('rm -rf /')")  
    +
      +
    1. Don’t do this either. +
    + +

    eval() is EVIL + +

    Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use eval() on trusted input. Of course, the trick is figuring out what’s “trusted.” But here’s something I know for certain: you should NOT take this alphametics solver and put it on the internet as a fun little web service. Don’t make the mistake of thinking, “Gosh, the function does a lot of string manipulation before getting a string to evaluate; I can’t imagine how someone could exploit that.” Someone WILL figure out how to sneak nasty executable code past all that string manipulation (stranger things have happened), and then you can kiss your server goodbye. + +

    But surely there’s some way to evaluate expressions safely? To put eval() in a sandbox where it can’t access or harm the outside world? Well, yeah, but it’s tricky. + +

    +>>> x = 5
    +>>> eval("x * 5", {}, {})               
    +Traceback (most recent call last):
    +  File "<stdin>", line 1, in <module>
    +  File "<string>", line 1, in <module>
    +NameError: name 'x' is not defined
    +>>> eval("x * 5", {"x": x}, {})         
    +>>> import math
    +>>> eval("math.sqrt(x)", {"x": x}, {})  
    +Traceback (most recent call last):
    +  File "<stdin>", line 1, in <module>
    +  File "<string>", line 1, in <module>
    +NameError: name 'math' is not defined
    +
      +
    1. The second and third parameters passed to the eval() function act as the global and local namespaces for evaluating the expression. In this case, they are both empty, which means that when the string "x * 5" is evaluated, there is no reference to x in either the global or local namespace, so eval() throws an exception. +
    2. You can selectively include specific values in the global namespace by listing them individually. Then those — and only those — variables will be available during evaluation. +
    3. Even though you just imported the math module, you didn’t include it in the namespace passed to the eval() function, so the evaluation failed. +
    + +

    Gee, that was easy. Lemme make an alphametics web service now! + +

    +>>> eval("pow(5, 2)", {}, {})                   
    +25
    +>>> eval("__import__('math').sqrt(5)", {}, {})  
    +2.2360679774997898
    +
      +
    1. Even though you’ve passed empty dictionaries for the global and local namespaces, all of Python’s built-in functions are still available during evaluation. So pow(5, 2) works, because 5 and 2 are literals, and pow() is a built-in function. +
    2. Unfortunately (and if you don’t see why it’s unfortunate, read on), the __import__() function is also a built-in function, so it works too. +
    + +

    Yeah, that means you can still do nasty things, even if you explicitly set the global and local namespaces to empty dictionaries when calling eval(): + +

    +>>> eval("__import__('subprocess').getoutput('rm -rf /')", {}, {})  
    +
      +
    1. Please don’t do this.
    2. +
    + +

    Oops. I’m glad I didn’t make that alphametics web service. Is there any way to use eval() safely? + +

    +>>> eval("__import__('math').sqrt(5)",
    +...     {"__builtins__":None}, {})          
    +Traceback (most recent call last):
    +  File "<stdin>", line 1, in <module>
    +  File "<string>", line 1, in <module>
    +NameError: name '__import__' is not defined
    +>>> eval("__import__('subprocess').getoutput('rm -rf /')",
    +...     {"__builtins__":None}, {})          
    +Traceback (most recent call last):
    +  File "", line 1, in 
    +  File "", line 1, in 
    +NameError: name '__import__' is not defined
    +
      +
    1. To evaluate untrusted expressions safely, you need to define a global namespace dictionary that maps "__builtins__" to None, the Python null value. Internally, the “built-in” functions are contained within a pseudo-module called "__builtins__". This pseudo-module (i.e. the set of built-in functions) is made available to evaluated expressions unless you explicitly override it. +
    2. You may do this, but be very, very careful not to make any typos. In particular, be sure you’ve overridden __builtins__ and not __builtin__ or __built-ins__ or some other variation. +
    + +

    So, in the end, it is possible to safely evaluate untrusted Python expressions. Passing {"__builtins__": None} as the second parameter to the eval() function is non-intuitive (and not the default behavior), but it does work. If you understand why it works, you’re less likely to use eval() incorrectly, in a way that works with trusted input but has potentially devastating consequences with untrusted input.

    Putting It All Together

    @@ -398,7 +535,6 @@ for guess in itertools.permutations(digits, len(characters)):
  • Finds all the letters in the puzzle with the re.findall() function
  • Find all the unique letters in the puzzle with set comprehensions
  • Checks if there are more than 10 unique letters (meaning the puzzle is definitely unsolvable) with an assert statement -
  • FIXME sorts the letters with a set difference operation
  • Converts the letters to their ASCII equivalents with a generator object
  • Calculates all the possible solutions with the itertools.permutations() function
  • Converts each possible solution to a Python expression with the translate() string method