various clarifications about generator expressions

This commit is contained in:
Mark Pilgrim
2009-07-15 10:36:02 -04:00
parent 8843481267
commit 29240baeee
3 changed files with 22 additions and 9 deletions
+13 -7
View File
@@ -208,9 +208,13 @@ AssertionError: Only for very large values of 2</samp></pre>
<li>A generator expression is like an anonymous function that yields values. The expression itself looks like a list comprehension [FIXME have we introduced this yet?], but it&#8217;s wrapped in parentheses instead of square brackets.
<li>The generator expression returns&hellip; an iterator.
<li>Calling <code>next(<var>gen</var>)</code> returns the next value from the iterator.
<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>.
<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>. In these cases, you don&#8217;t need an extra set of parentheses&nbsp;&mdash;&nbsp;just pass the &#8220;bare&#8221; expression <code>ord(c) for c in unique_characters</code> to the <code>tuple()</code> function, and Python figures out that it&#8217;s a generator expression.
</ol>
<blockquote class=note>
<p><span class=u>&#x261E;</span>Using a generator expression instead of a list comprehension can save both <abbr>CPU</abbr> and <abbr>RAM</abbr>. If you&#8217;re building an list just to throw it away (<i>e.g.</i> passing it to <code>tuple()</code> or <code>set()</code>), use a generator expression instead!
</blockquote>
<p>Here&#8217;s another way to accomplish the same thing, using a <a href=generators.html>generator function</a>:
<pre class=nd><code class=pp>def ord_map(a_string):
@@ -219,6 +223,8 @@ AssertionError: Only for very large values of 2</samp></pre>
gen = ord_map(unique_characters)</code></pre>
<p>The generator expression is more compact but functionally equivalent.
<p class=a>&#x2042;
<h2 id=permutations>Calculating Permutations&hellip; The Lazy Way!</h2>
@@ -498,25 +504,25 @@ for guess in itertools.permutations(digits, len(characters)):
<pre class=screen>
<samp class=p>>>> </samp><kbd class=pp>import subprocess</kbd>
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd> <span class=u>&#x2460;</span></a>
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd> <span class=u>&#x2460;</span></a>
<samp class=pp>'Desktop Library Pictures \
Documents Movies Public \
Music Sites'</samp>
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm -rf /')")</kbd> <span class=u>&#x2461;</span></a></pre>
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm /some/random/file')")</kbd> <span class=u>&#x2461;</span></a></pre>
<ol>
<li>The <code>subprocess</code> module allows you to run arbitrary shell commands and get the result as a Python string.
<li>Don&#8217;t do this.
<li>Arbitrary shell commands can have permanent consequences.
</ol>
<p>It&#8217;s even worse than that, because there&#8217;s a global <code>__import__()</code> function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of <code>eval()</code>, you can construct a single expression that will wipe out all your files:
<pre class=screen>
<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm -rf /')")</kbd> <span class=u>&#x2460;</span></a></pre>
<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')")</kbd> <span class=u>&#x2460;</span></a></pre>
<ol>
<li>Don&#8217;t do this either.
<li>Now imagine the output of <code>'rm -rf ~'</code>. Actually there wouldn&#8217;t be any output, but you wouldn&#8217;t have any files left either.
</ol>
<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>eval() is EVIL
<p class=xxxl>eval() is EVIL
<p>Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use <code>eval()</code> on trusted input. Of course, the trick is figuring out what&#8217;s &#8220;trusted.&#8221; But here&#8217;s something I know for certain: you should <b>NOT</b> take this alphametics solver and put it on the internet as a fun little web service. Don&#8217;t make the mistake of thinking, &#8220;Gosh, the function does a lot of string manipulation before getting a string to evaluate; <em>I can&#8217;t imagine</em> how someone could exploit that.&#8221; Someone <b>WILL</b> figure out how to sneak nasty executable code past all that string manipulation (<a href=http://www.matasano.com/log/1032/this-new-vulnerability-dowds-inhuman-flash-exploit/>stranger things have happened</a>), and then you can kiss your server goodbye.
+8 -1
View File
@@ -48,6 +48,7 @@ Classname Legend
.note = "note/caution/important" = indented block for tips/gotchas/language comparisons
.baa = "best available ampersand" = wrapper block for ampersands
.ots = "on the side" = an aside that is set in normal type (as opposed to a big blue pullquote)
.xxxl = "ridiculously large" = text sized 1000% larger than normal type
Acknowledgements & Inspirations
@@ -126,7 +127,7 @@ html {
body {
margin: 1.75em 28px;
}
.c, .a {
.c, .a, .xxxl {
clear: both;
text-align: center;
}
@@ -153,6 +154,12 @@ form div, #level {
.pf {
padding: 0 1.75em;
}
.xxxl {
font-size:1000%;
font-weight:bold;
line-height:1;
margin:0.7em 0;
}
/* links */
+1 -1
View File
@@ -69,7 +69,7 @@ My alphabet starts where your alphabet ends! <span class=u>&#x275E;</span><br>&m
<p>Other people pondered these questions, and they came up with a solution:
<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>UTF-8
<p class=xxxl>UTF-8
<p>UTF-8 is a <em>variable-length</em> encoding system for Unicode. That is, different characters take up a different number of bytes. For <abbr>ASCII</abbr> characters (A-Z, <i class=baa>&amp;</i>c.) UTF-8 uses just one byte per character. In fact, it uses the exact same bytes; the first 128 characters (0&ndash;127) in UTF-8 are indistinguishable from <abbr>ASCII</abbr>. &#8220;Extended Latin&#8221; characters like &ntilde; and &ouml; end up taking two bytes. (The bytes are not simply the Unicode code point like they would be in UTF-16; there is some serious bit-twiddling involved.) Chinese characters like &#x4E2D; end up taking three bytes. The rarely-used &#8220;astral plane&#8221; characters take four bytes.