mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
various clarifications about generator expressions
This commit is contained in:
+13
-7
@@ -208,9 +208,13 @@ AssertionError: Only for very large values of 2</samp></pre>
|
||||
<li>A generator expression is like an anonymous function that yields values. The expression itself looks like a list comprehension [FIXME have we introduced this yet?], but it’s wrapped in parentheses instead of square brackets.
|
||||
<li>The generator expression returns… an iterator.
|
||||
<li>Calling <code>next(<var>gen</var>)</code> returns the next value from the iterator.
|
||||
<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>.
|
||||
<li>If you like, you can iterate through all the possible values and return a tuple, list, or set, by passing the generator expression to <code>tuple()</code>, <code>list()</code>, or <code>set()</code>. In these cases, you don’t need an extra set of parentheses — just pass the “bare” expression <code>ord(c) for c in unique_characters</code> to the <code>tuple()</code> function, and Python figures out that it’s a generator expression.
|
||||
</ol>
|
||||
|
||||
<blockquote class=note>
|
||||
<p><span class=u>☞</span>Using a generator expression instead of a list comprehension can save both <abbr>CPU</abbr> and <abbr>RAM</abbr>. If you’re building an list just to throw it away (<i>e.g.</i> passing it to <code>tuple()</code> or <code>set()</code>), use a generator expression instead!
|
||||
</blockquote>
|
||||
|
||||
<p>Here’s another way to accomplish the same thing, using a <a href=generators.html>generator function</a>:
|
||||
|
||||
<pre class=nd><code class=pp>def ord_map(a_string):
|
||||
@@ -219,6 +223,8 @@ AssertionError: Only for very large values of 2</samp></pre>
|
||||
|
||||
gen = ord_map(unique_characters)</code></pre>
|
||||
|
||||
<p>The generator expression is more compact but functionally equivalent.
|
||||
|
||||
<p class=a>⁂
|
||||
|
||||
<h2 id=permutations>Calculating Permutations… The Lazy Way!</h2>
|
||||
@@ -498,25 +504,25 @@ for guess in itertools.permutations(digits, len(characters)):
|
||||
|
||||
<pre class=screen>
|
||||
<samp class=p>>>> </samp><kbd class=pp>import subprocess</kbd>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd> <span class=u>①</span></a>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('ls ~')")</kbd> <span class=u>①</span></a>
|
||||
<samp class=pp>'Desktop Library Pictures \
|
||||
Documents Movies Public \
|
||||
Music Sites'</samp>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm -rf /')")</kbd> <span class=u>②</span></a></pre>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>eval("subprocess.getoutput('rm /some/random/file')")</kbd> <span class=u>②</span></a></pre>
|
||||
<ol>
|
||||
<li>The <code>subprocess</code> module allows you to run arbitrary shell commands and get the result as a Python string.
|
||||
<li>Don’t do this.
|
||||
<li>Arbitrary shell commands can have permanent consequences.
|
||||
</ol>
|
||||
|
||||
<p>It’s even worse than that, because there’s a global <code>__import__()</code> function that takes a module name as a string, imports the module, and returns a reference to it. Combined with the power of <code>eval()</code>, you can construct a single expression that will wipe out all your files:
|
||||
|
||||
<pre class=screen>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm -rf /')")</kbd> <span class=u>①</span></a></pre>
|
||||
<a><samp class=p>>>> </samp><kbd class=pp>eval("__import__('subprocess').getoutput('rm /some/random/file')")</kbd> <span class=u>①</span></a></pre>
|
||||
<ol>
|
||||
<li>Don’t do this either.
|
||||
<li>Now imagine the output of <code>'rm -rf ~'</code>. Actually there wouldn’t be any output, but you wouldn’t have any files left either.
|
||||
</ol>
|
||||
|
||||
<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>eval() is EVIL
|
||||
<p class=xxxl>eval() is EVIL
|
||||
|
||||
<p>Well, the evil part is evaluating arbitrary expressions from untrusted sources. You should only use <code>eval()</code> on trusted input. Of course, the trick is figuring out what’s “trusted.” But here’s something I know for certain: you should <b>NOT</b> take this alphametics solver and put it on the internet as a fun little web service. Don’t make the mistake of thinking, “Gosh, the function does a lot of string manipulation before getting a string to evaluate; <em>I can’t imagine</em> how someone could exploit that.” Someone <b>WILL</b> figure out how to sneak nasty executable code past all that string manipulation (<a href=http://www.matasano.com/log/1032/this-new-vulnerability-dowds-inhuman-flash-exploit/>stranger things have happened</a>), and then you can kiss your server goodbye.
|
||||
|
||||
|
||||
@@ -48,6 +48,7 @@ Classname Legend
|
||||
.note = "note/caution/important" = indented block for tips/gotchas/language comparisons
|
||||
.baa = "best available ampersand" = wrapper block for ampersands
|
||||
.ots = "on the side" = an aside that is set in normal type (as opposed to a big blue pullquote)
|
||||
.xxxl = "ridiculously large" = text sized 1000% larger than normal type
|
||||
|
||||
Acknowledgements & Inspirations
|
||||
|
||||
@@ -126,7 +127,7 @@ html {
|
||||
body {
|
||||
margin: 1.75em 28px;
|
||||
}
|
||||
.c, .a {
|
||||
.c, .a, .xxxl {
|
||||
clear: both;
|
||||
text-align: center;
|
||||
}
|
||||
@@ -153,6 +154,12 @@ form div, #level {
|
||||
.pf {
|
||||
padding: 0 1.75em;
|
||||
}
|
||||
.xxxl {
|
||||
font-size:1000%;
|
||||
font-weight:bold;
|
||||
line-height:1;
|
||||
margin:0.7em 0;
|
||||
}
|
||||
|
||||
/* links */
|
||||
|
||||
|
||||
+1
-1
@@ -69,7 +69,7 @@ My alphabet starts where your alphabet ends! <span class=u>❞</span><br>&m
|
||||
|
||||
<p>Other people pondered these questions, and they came up with a solution:
|
||||
|
||||
<p class=c style='font-size:1000%;font-weight:bold;line-height:1;margin:0.7em 0'>UTF-8
|
||||
<p class=xxxl>UTF-8
|
||||
|
||||
<p>UTF-8 is a <em>variable-length</em> encoding system for Unicode. That is, different characters take up a different number of bytes. For <abbr>ASCII</abbr> characters (A-Z, <i class=baa>&</i>c.) UTF-8 uses just one byte per character. In fact, it uses the exact same bytes; the first 128 characters (0–127) in UTF-8 are indistinguishable from <abbr>ASCII</abbr>. “Extended Latin” characters like ñ and ö end up taking two bytes. (The bytes are not simply the Unicode code point like they would be in UTF-16; there is some serious bit-twiddling involved.) Chinese characters like 中 end up taking three bytes. The rarely-used “astral plane” characters take four bytes.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user