diff --git a/advanced-iterators.html b/advanced-iterators.html index 3d5027d..b7149e4 100644 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -330,7 +330,7 @@ Wesley
itertools.zip_longest() function stops at the end of the longest sequence, inserting None values for items past the end of the shorter sequences.
-OK, that was all very interesting, but how does it relate to the alphametics solver? Here’s how: +
OK, that was all very interesting, but how does it relate to the alphametics solver? Here’s how:
>>> characters = ('S', 'M', 'E', 'D', 'O', 'N', 'R', 'Y')
@@ -346,7 +346,7 @@ Wesley
dict() function to create a dictionary that uses letters as keys and their associated digits as values. Although the printed representation of the dictionary lists the pairs in a different order (dictionaries have no “order” per se), you can see that each letter is associated with the digit, based on the ordering of the original characters and guess sequences.
-The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution. +
The alphametics solver uses this technique to create a dictionary that maps letters in the puzzle to digits in the solution, for each possible solution.
characters = tuple(ord(c) for c in sorted_characters)
digits = tuple(ord(c) for c in '0123456789')
@@ -359,36 +359,173 @@ for guess in itertools.permutations(digits, len(characters)):
A New Kind Of String Manipulation
-FIXME
+
Python strings have many methods. You learned about some of those methods in the Strings chapter: lower(), count(), and format(). Now I want to introduce you to a powerful but little-known string manipulation technique: the translate() method.
->>> characters = tuple(ord(c) for c in 'SMEDONRY')
+>>> translation_table = {ord("A"): ord("O")} ①
+>>> translation_table ②
+{65: 79}
+>>> 'MARK'.translate(translation_table) ③
+'MORK'
+
+- String translation starts with a translation table, which is just a dictionary that maps one character to another. Actually, “character” is incorrect — the translation table really maps one byte to another.
+
- Remember, bytes in Python 3 are integers. The
ord() function returns the ASCII value of a character, which, in the case of A–Z, is always a byte from 65 to 90.
+ - The
translate() method on a string takes a translation table and runs the string through it. That is, it replaces all occurrences of the keys of the translation table with the corresponding values. In this case, “translating” MARK to MORK.
+
+
+What does this have to do with solving alphametic puzzles? As it turns out, everything.
+
+
+>>> characters = tuple(ord(c) for c in 'SMEDONRY') ①
>>> characters
(83, 77, 69, 68, 79, 78, 82, 89)
->>> digits = tuple(ord(c) for c in '0123456789')
->>> digits
-(48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
->>> guess = (49, 50, 48, 51, 52, 53, 54, 55)
->>> translation_table = dict(zip(characters, guess))
+>>> guess = tuple(ord(c) for c in '91570682') ②
+>>> guess
+(57, 49, 53, 55, 48, 54, 56, 50)
+>>> translation_table = dict(zip(characters, guess)) ③
>>> translation_table
-{68: 51, 69: 48, 77: 50, 78: 53, 79: 52, 82: 54, 83: 49, 89: 55}
->>> "SEND + MORE == MONEY".translate(translation_table)
-'1053 + 2460 == 24507'
+{68: 55, 69: 53, 77: 49, 78: 54, 79: 48, 82: 56, 83: 57, 89: 50}
+>>> "SEND + MORE == MONEY".translate(translation_table) ④
+'9567 + 1085 == 10652'
+alphametics.solve() function.
+itertools.permutations() function in the alphametics.solve() function.
+alphametics.solve() function does inside the for loop.
+translate() method of the original puzzle string. This converts each letter in the string to the corresponding digit (based on the letters in characters and the digits in guess). The result is a valid Python expression, as a string.
+FIXME - -
->>> translation_table = {ord("A"): ord("O")}
->>> translation_table
-{65: 79}
->>> 'MARK'.translate(translation_table)
-'MORK'
-
-FIXME +
That’s pretty impressive. But what can you do with a string that happens to be a valid Python expression?
FIXME +
This is the final piece of the puzzle (or rather, the final piece of the puzzle solver). After all that fancy string manipulation, we’re left with a string like '9567 + 1085 == 10652'. But that’s a string, and what good is a string? Enter eval(), the universal Python evaluation tool.
+
+
+>>> eval('1 + 1 == 2')
+True
+>>> eval('1 + 1 == 3')
+False
+>>> eval('9567 + 1085 == 10652')
+True
+
+But wait, there’s more! The eval() function isn’t limited to boolean expressions. It can handle any Python expression and returns any datatype.
+
+
+>>> eval('"A" + "B"')
+'AB'
+>>> eval('"MARK".translate({65: 79})')
+'MORK'
+>>> eval('"AAAAA".count("A")')
+5
+>>> eval('["*"] * 5')
+['*', '*', '*', '*', '*']
+
+But wait, that’s not all! + +
+>>> x = 5 +>>> eval("x * 5") ① +25 +>>> eval("pow(x, 2)") ② +25 +>>> import math +>>> eval("math.sqrt(x)") ③ +2.2360679774997898+
eval() takes can reference global variables defined outside the eval(). If called within a function, it can reference local variables too.
+Hey, wait a minute… + +
+>>> import subprocess +>>> eval("subprocess.getoutput('ls ~')") ① +'Desktop Library Pictures \ + Documents Movies Public \ + Music Sites' +>>> eval("subprocess.getoutput('rm -rf /')") ②+
subprocess module allows you to run arbitrary shell commands and get the result as a Python string.
+It’s even worse than that, because there’s a global __import__() function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of eval(), you can construct a single expression that will wipe out all your files:
+
+
+>>> eval("__import__('subprocess').getoutput('rm -rf /')") ①+
eval() is EVIL + +
Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use eval() on trusted input. Of course, the trick is figuring out what’s “trusted.” But here’s something I know for certain: you should NOT take this alphametics solver and put it on the internet as a fun little web service. Don’t make the mistake of thinking, “Gosh, the function does a lot of string manipulation before getting a string to evaluate; I can’t imagine how someone could exploit that.” Someone WILL figure out how to sneak nasty executable code past all that string manipulation (stranger things have happened), and then you can kiss your server goodbye.
+
+
But surely there’s some way to evaluate expressions safely? To put eval() in a sandbox where it can’t access or harm the outside world? Well, yeah, but it’s tricky.
+
+
+>>> x = 5 +>>> eval("x * 5", {}, {}) ① +Traceback (most recent call last): + File "<stdin>", line 1, in <module> + File "<string>", line 1, in <module> +NameError: name 'x' is not defined +>>> eval("x * 5", {"x": x}, {}) ② +>>> import math +>>> eval("math.sqrt(x)", {"x": x}, {}) ② +Traceback (most recent call last): + File "<stdin>", line 1, in <module> + File "<string>", line 1, in <module> +NameError: name 'math' is not defined+
eval() function act as the global and local namespaces for evaluating the expression. In this case, they are both empty, which means that when the string "x * 5" is evaluated, there is no reference to x in either the global or local namespace, so eval() throws an exception.
+math module, you didn’t include it in the namespace passed to the eval() function, so the evaluation failed.
+Gee, that was easy. Lemme make an alphametics web service now! + +
+>>> eval("pow(5, 2)", {}, {}) ① +25 +>>> eval("__import__('math').sqrt(5)", {}, {}) ② +2.2360679774997898+
pow(5, 2) works, because 5 and 2 are literals, and pow() is a built-in function.
+__import__() function is also a built-in function, so it works too.
+Yeah, that means you can still do nasty things, even if you explicitly set the global and local namespaces to empty dictionaries when calling eval():
+
+
+>>> eval("__import__('subprocess').getoutput('rm -rf /')", {}, {}) ①+
Oops. I’m glad I didn’t make that alphametics web service. Is there any way to use eval() safely?
+
+
+>>> eval("__import__('math').sqrt(5)",
+... {"__builtins__":None}, {}) ①
+Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ File "<string>", line 1, in <module>
+NameError: name '__import__' is not defined
+>>> eval("__import__('subprocess').getoutput('rm -rf /')",
+... {"__builtins__":None}, {}) ②
+Traceback (most recent call last):
+ File "", line 1, in
+ File "", line 1, in
+NameError: name '__import__' is not defined
+"__builtins__" to None, the Python null value. Internally, the “built-in” functions are contained within a pseudo-module called "__builtins__". This pseudo-module (i.e. the set of built-in functions) is made available to evaluated expressions unless you explicitly override it.
+__builtins__ and not __builtin__ or __built-ins__ or some other variation.
+So, in the end, it is possible to safely evaluate untrusted Python expressions. Passing {"__builtins__": None} as the second parameter to the eval() function is non-intuitive (and not the default behavior), but it does work. If you understand why it works, you’re less likely to use eval() incorrectly, in a way that works with trusted input but has potentially devastating consequences with untrusted input.
re.findall() function
assert statement
-itertools.permutations() function
translate() string method