various clarifications about generator expressions

2026-06-05 23:10:17 +00:00 · 2009-07-15 10:36:02 -04:00
parent 8843481267
commit 29240baeee
3 changed files with 22 additions and 9 deletions
@@ -208,9 +208,13 @@ AssertionError: Only for very large values of 2</samp></pre>
 <li>A generator expression is like an anonymous function that yields values. The expression itself looks like a list comprehension [FIXME have we introduced this yet?], but it&#8217;s wrapped in parentheses instead of square brackets.
 <li>The generator expression returns&hellip; an iterator.
 <li>Calling <code>next(<var>gen</var>)</code> returns the next value from the iterator.
-<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>.
+<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>. In these cases, you don&#8217;t need an extra set of parentheses&nbsp;&mdash;&nbsp;just pass the &#8220;bare&#8221; expression <code>ord(c) for c in unique_characters</code> to the <code>tuple()</code> function, and Python figures out that it&#8217;s a generator expression.
 </ol>

+<blockquote class=note>
+<p><span class=u>&#x261E;</span>Using a generator expression instead of a list comprehension can save both <abbr>CPU</abbr> and <abbr>RAM</abbr>. If you&#8217;re building an list just to throw it away (<i>e.g.</i> passing it to <code>tuple()</code> or <code>set()</code>), use a generator expression instead!
+</blockquote>
+
 <p>Here&#8217;s another way to accomplish the same thing, using a <a href=generators.html>generator function</a>:

 <pre class=nd><code class=pp>def ord_map(a_string):
@@ -219,6 +223,8 @@ AssertionError: Only for very large values of 2</samp></pre>

 gen = ord_map(unique_characters)</code></pre>

+<p>The generator expression is more compact but functionally equivalent.
+
 <p class=a>&#x2042;

 <h2 id=permutations>Calculating Permutations&hellip; The Lazy Way!</h2>
@@ -498,25 +504,25 @@ for guess in itertools.permutations(digits, len(characters)):

 <pre class=screen>
 <samp class=p>>>> </samp><kbd class=pp>import subprocess</kbd>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd>      <span class=u>&#x2460;</span></a>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd>                  <span class=u>&#x2460;</span></a>
 <samp class=pp>'Desktop         Library         Pictures \
 Documents       Movies          Public   \
 Music           Sites'</samp>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm -rf /')")</kbd>  <span class=u>&#x2461;</span></a></pre>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm /some/random/file')")</kbd>  <span class=u>&#x2461;</span></a></pre>
 <ol>
 <li>The <code>subprocess</code> module allows you to run arbitrary shell commands and get the result as a Python string.
-<li>Don&#8217;t do this.
+<li>Arbitrary shell commands can have permanent consequences.
 </ol>

 <p>It&#8217;s even worse than that, because there&#8217;s a global <code>__import__()</code> function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of <code>eval()</code>, you can construct a single expression that will wipe out all your files:

 <pre class=screen>
-<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm -rf /')")</kbd>  <span class=u>&#x2460;</span></a></pre>
+<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')")</kbd>  <span class=u>&#x2460;</span></a></pre>
 <ol>
-<li>Don&#8217;t do this either.
+<li>Now imagine the output of <code>'rm -rf ~'</code>. Actually there wouldn&#8217;t be any output, but you wouldn&#8217;t have any files left either.
 </ol>

-<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>eval() is EVIL
+<p class=xxxl>eval() is EVIL

 <p>Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use <code>eval()</code> on trusted input. Of course, the trick is figuring out what&#8217;s &#8220;trusted.&#8221; But here&#8217;s something I know for certain: you should <b>NOT</b> take this alphametics solver and put it on the internet as a fun little web service. Don&#8217;t make the mistake of thinking, &#8220;Gosh, the function does a lot of string manipulation before getting a string to evaluate; <em>I can&#8217;t imagine</em> how someone could exploit that.&#8221; Someone <b>WILL</b> figure out how to sneak nasty executable code past all that string manipulation (<a href=http://www.matasano.com/log/1032/this-new-vulnerability-dowds-inhuman-flash-exploit/>stranger things have happened</a>), and then you can kiss your server goodbye.

@@ -48,6 +48,7 @@ Classname Legend
 .note = "note/caution/important"   = indented block for tips/gotchas/language comparisons
 .baa  = "best available ampersand" = wrapper block for ampersands
 .ots  = "on the side"              = an aside that is set in normal type (as opposed to a big blue pullquote)
+.xxxl = "ridiculously large"       = text sized 1000% larger than normal type

 Acknowledgements & Inspirations

@@ -126,7 +127,7 @@ html {
 body {
  margin: 1.75em 28px;
 }
-.c, .a {
+.c, .a, .xxxl {
  clear: both;
  text-align: center;
 }
@@ -153,6 +154,12 @@ form div, #level {
 .pf {
  padding: 0 1.75em;
 }
+.xxxl {
+  font-size:1000%;
+  font-weight:bold;
+  line-height:1;
+  margin:0.7em 0;
+}

 /* links */

@@ -69,7 +69,7 @@ My alphabet starts where your alphabet ends! <span class=u>&#x275E;</span><br>&m

 <p>Other people pondered these questions, and they came up with a solution:

-<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>UTF-8
+<p class=xxxl>UTF-8

 <p>UTF-8 is a <em>variable-length</em> encoding system for Unicode. That is, different characters take up a different number of bytes. For <abbr>ASCII</abbr> characters (A-Z, <i class=baa>&amp;</i>c.) UTF-8 uses just one byte per character. In fact, it uses the exact same bytes; the first 128 characters (0&ndash;127) in UTF-8 are indistinguishable from <abbr>ASCII</abbr>. &#8220;Extended Latin&#8221; characters like &ntilde; and &ouml; end up taking two bytes. (The bytes are not simply the Unicode code point like they would be in UTF-16; there is some serious bit-twiddling involved.) Chinese characters like &#x4E2D; end up taking three bytes. The rarely-used &#8220;astral plane&#8221; characters take four bytes.