mirror of
https://github.com/kennethreitz/dive-into-python3.git
synced 2026-06-05 23:10:17 +00:00
constrained writing ;-)
This commit is contained in:
@@ -34,11 +34,12 @@ body{counter-reset:h1 4}
|
||||
<li><a href=#summary>Summary</a>
|
||||
</ol>
|
||||
<h2 id=divingin>Diving in</h2>
|
||||
<p class=fancy>Regular expressions are a powerful and standardized way of searching, replacing, and parsing text with complex patterns of
|
||||
characters. If you’ve used regular expressions in other languages (like Perl), the syntax will be very familiar, and you get by just reading the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
|
||||
<p>Strings have methods for searching and replacing: <code>index()</code>, <code>find()</code>, <code>split()</code>, <code>count()</code>, <code>replace()</code>, <i class=baa>&</i>c. But these methods are limited to the simplest of cases. For example, the <code>index()</code> method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string <var>s</var>, you must call <code>s.lower()</code> or <code>s.upper()</code> and make sure your search strings are the appropriate case to match. The <code>replace()</code> and <code>split()</code> methods have the same limitations.
|
||||
<p>If your goal can be accomplished with string functions, you should use them. They’re fast and simple and easy to read, and there’s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with <code>if</code> statements to handle special cases, or if you’re combining them with <code>split()</code> and <code>join()</code> and list comprehensions in weird unreadable ways, you may need to move up to regular expressions.
|
||||
<p>Although the regular expression syntax is tight and unlike normal code, the result can end up being <em>more</em> readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them.
|
||||
<p class=fancy>Every modern programming language has built-in functions for working with strings. In Python, strings have methods for searching and replacing: <code>index()</code>, <code>find()</code>, <code>split()</code>, <code>count()</code>, <code>replace()</code>, <i class=baa>&</i>c. But these methods are limited to the simplest of cases. For example, the <code>index()</code> method looks for a single, hard-coded substring, and the search is always case-sensitive. To do case-insensitive searches of a string <var>s</var>, you must call <code>s.lower()</code> or <code>s.upper()</code> and make sure your search strings are the appropriate case to match. The <code>replace()</code> and <code>split()</code> methods have the same limitations.
|
||||
<p>If your goal can be accomplished with string methods, you should use them. They’re fast and simple and easy to read, and there’s a lot to be said for fast, simple, readable code. But if you find yourself using a lot of different string functions with <code>if</code> statements to handle special cases, or if you’re chaining calls to <code>split()</code> and <code>join()</code> to slice-and-dice your strings, you may need to move up to regular expressions.
|
||||
<p>Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing text with complex patterns of characters. Although the regular expression syntax is tight and unlike normal code, the result can end up being <em>more</em> readable than a hand-rolled solution that uses a long chain of string functions. There are even ways of embedding comments within regular expressions, so you can include fine-grained documentation within them.
|
||||
<blockquote class="note compare perl5">
|
||||
<p><span>☞</span>If you’ve used regular expressions in other languages (like Perl 5), Python’s syntax will be very familiar. Read the summary of the <a href=http://docs.python.org/dev/library/re.html#module-contents><code>re</code> module</a> to get an overview of the available functions and their arguments.
|
||||
</blockquote>
|
||||
<h2 id=streetaddresses>Case study: street addresses</h2>
|
||||
<p>This series of examples was inspired by a real-life problem I had in my day job several years ago, when I needed to scrub and standardize street addresses exported from a legacy system before importing them into a newer system. (See, I don’t just make this stuff up; it’s actually useful.) This example shows how I approached the problem.
|
||||
<pre class=screen>
|
||||
|
||||
Reference in New Issue
Block a user