diff --git a/regular-expressions.html b/regular-expressions.html index 9886796..8375780 100755 --- a/regular-expressions.html +++ b/regular-expressions.html @@ -66,7 +66,7 @@ body{counter-reset:h1 4} '100 BROAD RD. APT 3'
'ROAD' when it was at the end of the string and it was its own word (and not a part of some larger word). To express this in a regular expression, you use \b, which means “a word boundary must occur right here.” In Python, this is complicated by the fact that the '\' character in a string must itself be escaped. This is sometimes referred to as the backslash plague, and it is one reason why regular expressions are easier in Perl than in Python. On the down side, Perl mixes regular expressions with other syntax, so if you have a bug, it may be hard to tell whether it’s a bug in syntax or a bug in your regular expression.
-r. This tells Python that nothing in this string should be escaped; '\t' is a tab character, but r'\t' is really the backslash character \ followed by the letter t. I recommend always using raw strings when dealing with regular expressions; otherwise, things get too confusing too quickly (and regular expressions are confusing enough already).
+r. This tells Python that nothing in this string should be escaped; '\t' is a tab character, but r'\t' is really the backslash character \ followed by the letter t. I recommend always using raw strings when dealing with regular expressions; otherwise, things get too confusing too quickly (and regular expressions are confusing enough already).
'ROAD' as a whole word by itself, but it wasn’t at the end, because the address had an apartment number after the street designation. Because 'ROAD' isn’t at the very end of the string, it doesn’t match, so the entire call to re.sub() ends up replacing nothing at all, and you get the original string back, which is not what you want.
$ character and added another \b. Now the regular expression reads “match 'ROAD' when it’s a whole word by itself anywhere in the string,” whether at the end, the beginning, or somewhere in the middle.