From 29240baeeed29d2a3f30e569a9e31b3b4708359c Mon Sep 17 00:00:00 2001 From: Mark Pilgrim Date: Wed, 15 Jul 2009 10:36:02 -0400 Subject: [PATCH] various clarifications about generator expressions --- advanced-iterators.html | 20 +++++++++++++------- dip3.css | 9 ++++++++- strings.html | 2 +- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/advanced-iterators.html b/advanced-iterators.html index 990afb3..1f9751c 100755 --- a/advanced-iterators.html +++ b/advanced-iterators.html @@ -208,9 +208,13 @@ AssertionError: Only for very large values of 2
  • A generator expression is like an anonymous function that yields values. The expression itself looks like a list comprehension [FIXME have we introduced this yet?], but it’s wrapped in parentheses instead of square brackets.
  • The generator expression returns… an iterator.
  • Calling next(gen) returns the next value from the iterator. -
  • If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to tuple(), list(), or set(). +
  • If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to tuple(), list(), or set(). In these cases, you don’t need an extra set of parentheses — just pass the “bare” expression ord(c) for c in unique_characters to the tuple() function, and Python figures out that it’s a generator expression. +
    +

    Using a generator expression instead of a list comprehension can save both CPU and RAM. If you’re building an list just to throw it away (e.g. passing it to tuple() or set()), use a generator expression instead! +

    +

    Here’s another way to accomplish the same thing, using a generator function:

    def ord_map(a_string):
    @@ -219,6 +223,8 @@ AssertionError: Only for very large values of 2
    gen = ord_map(unique_characters) +

    The generator expression is more compact but functionally equivalent. +

    Calculating Permutations… The Lazy Way!

    @@ -498,25 +504,25 @@ for guess in itertools.permutations(digits, len(characters)):
     >>> import subprocess
    ->>> eval("subprocess.getoutput('ls ~')")      
    +>>> eval("subprocess.getoutput('ls ~')")                  
     'Desktop         Library         Pictures \
      Documents       Movies          Public   \
      Music           Sites'
    ->>> eval("subprocess.getoutput('rm -rf /')")  
    +>>> eval("subprocess.getoutput('rm /some/random/file')")
    1. The subprocess module allows you to run arbitrary shell commands and get the result as a Python string. -
    2. Don’t do this. +
    3. Arbitrary shell commands can have permanent consequences.

    It’s even worse than that, because there’s a global __import__() function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of eval(), you can construct a single expression that will wipe out all your files:

    ->>> eval("__import__('subprocess').getoutput('rm -rf /')")  
    +>>> eval("__import__('subprocess').getoutput('rm /some/random/file')")
      -
    1. Don’t do this either. +
    2. Now imagine the output of 'rm -rf ~'. Actually there wouldn’t be any output, but you wouldn’t have any files left either.
    -

    eval() is EVIL +

    eval() is EVIL

    Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use eval() on trusted input. Of course, the trick is figuring out what’s “trusted.” But here’s something I know for certain: you should NOT take this alphametics solver and put it on the internet as a fun little web service. Don’t make the mistake of thinking, “Gosh, the function does a lot of string manipulation before getting a string to evaluate; I can’t imagine how someone could exploit that.” Someone WILL figure out how to sneak nasty executable code past all that string manipulation (stranger things have happened), and then you can kiss your server goodbye. diff --git a/dip3.css b/dip3.css index 6c70a30..7c039a5 100755 --- a/dip3.css +++ b/dip3.css @@ -48,6 +48,7 @@ Classname Legend .note = "note/caution/important" = indented block for tips/gotchas/language comparisons .baa = "best available ampersand" = wrapper block for ampersands .ots = "on the side" = an aside that is set in normal type (as opposed to a big blue pullquote) +.xxxl = "ridiculously large" = text sized 1000% larger than normal type Acknowledgements & Inspirations @@ -126,7 +127,7 @@ html { body { margin: 1.75em 28px; } -.c, .a { +.c, .a, .xxxl { clear: both; text-align: center; } @@ -153,6 +154,12 @@ form div, #level { .pf { padding: 0 1.75em; } +.xxxl { + font-size:1000%; + font-weight:bold; + line-height:1; + margin:0.7em 0; +} /* links */ diff --git a/strings.html b/strings.html index b1a256f..f854b32 100755 --- a/strings.html +++ b/strings.html @@ -69,7 +69,7 @@ My alphabet starts where your alphabet ends!
    &m

    Other people pondered these questions, and they came up with a solution: -

    UTF-8 +

    UTF-8

    UTF-8 is a variable-length encoding system for Unicode. That is, different characters take up a different number of bytes. For ASCII characters (A-Z, &c.) UTF-8 uses just one byte per character. In fact, it uses the exact same bytes; the first 128 characters (0–127) in UTF-8 are indistinguishable from ASCII. “Extended Latin” characters like ñ and ö end up taking two bytes. (The bytes are not simply the Unicode code point like they would be in UTF-16; there is some serious bit-twiddling involved.) Chinese characters like 中 end up taking three bytes. The rarely-used “astral plane” characters take four bytes.