mirror of
https://github.com/kennethreitz-archive/.com.git
synced 2026-06-21 07:40:58 +00:00
264 lines
8.8 KiB
HTML
264 lines
8.8 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<title>Python + Regular Expressions</title>
|
||
<meta charset="utf-8" />
|
||
<link rel="stylesheet" href="./theme/css/main.css" type="text/css" />
|
||
<link href="./feeds/all.atom.xml" type="application/atom+xml" rel="alternate" title="Kenneth's log ATOM Feed" />
|
||
|
||
|
||
|
||
|
||
<!--[if IE]>
|
||
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script><![endif]-->
|
||
|
||
<!--[if lte IE 7]>
|
||
<link rel="stylesheet" type="text/css" media="all" href="./css/ie.css"/>
|
||
<script src="./js/IE8.js" type="text/javascript"></script><![endif]-->
|
||
|
||
<!--[if lt IE 7]>
|
||
<link rel="stylesheet" type="text/css" media="all" href="./css/ie6.css"/><![endif]-->
|
||
|
||
</head>
|
||
|
||
<body id="index" class="home">
|
||
|
||
<a href="http://github.com/kennethreitz/">
|
||
|
||
<img style="position: absolute; top: 0; right: 0; border: 0;" src="http://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png" alt="Fork me on GitHub" />
|
||
|
||
</a>
|
||
|
||
<header id="banner" class="body">
|
||
<h1>
|
||
<a href=".">Kenneth's log </a>
|
||
</h1>
|
||
|
||
<nav><ul>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<li >
|
||
<a href="./category/Life.html">Life</a>
|
||
</li>
|
||
|
||
<li class="active">
|
||
<a href="./category/Code.html">Code</a>
|
||
</li>
|
||
|
||
<li >
|
||
<a href="./category/projects.html">projects</a>
|
||
</li>
|
||
|
||
|
||
</ul></nav>
|
||
|
||
</header><!-- /#banner -->
|
||
|
||
|
||
<section id="content" class="body">
|
||
<article>
|
||
<header> <h1 class="entry-title"><a href="python-regular-expressions.html"
|
||
rel="bookmark" title="Permalink to Python + Regular Expressions">Python + Regular Expressions</a></h1> </header>
|
||
|
||
|
||
|
||
<div class="entry-content">
|
||
<footer class="post-info">
|
||
<abbr class="published" title="2009-03-17T05:30:00">
|
||
Tue 17 March 2009
|
||
</abbr>
|
||
<p>In <a href="./category/Code.html">Code</a>.
|
||
|
||
</p>
|
||
</footer><!-- /.post-info -->
|
||
|
||
|
||
<p>Have you ever needed to parse through large amounts of text looking
|
||
for a specific pattern? Patterns like “one capital letter followed
|
||
by three numbers” or “dd/mm/yyyy”? This is known as Pattern
|
||
Matching. Regular Expressions allow easy syntax for pattern
|
||
matching, and is an invaluable skill to add to one’s toolkit, no
|
||
matter what your area of expertise/practice is. Whether you’re
|
||
writing a Compiler, Form Validator, Text Editor, Django Project, or
|
||
Language Translator, Regular Expressions will always prove to be
|
||
invaluable. Here is a very basic overview of some syntax: ‘\d’
|
||
represents a digit. ‘\s’ represents whitespace. ‘.’ represents any
|
||
character. If you have worked with Python for very long, you are
|
||
probably already familiar with the concept. Take a look at the
|
||
following code: print(“Rounded = %05d” % (42))</p>
|
||
<p>This makes sure that the digit printed has 5 digits, and will
|
||
automatically add 0’s to compensate. If you understand this
|
||
concept, then you shouldn’t have a problem. Perl-style Regular
|
||
Expressions are a very widely-accepted implementation, and Python
|
||
has built in support for this mini-language! It’s easily
|
||
accessible, so let’s get started. The included ‘re’ module will
|
||
give us everything we need to get started: import re</p>
|
||
<p>Lets give our new module a try! It will enable you to do anything
|
||
you could ever want with regular expressions. Here’s a quick
|
||
example of some basic use. import re</p>
|
||
<pre class="literal-block">
|
||
string0 = 'Kenneth Reitz is a cool guy!'
|
||
regExp = r’kenneth[- ]?reitz’
|
||
|
||
if re.match(regExp, string0, re.IGNORECASE):
|
||
print “True”
|
||
else:
|
||
print “False”
|
||
</pre>
|
||
<p>This script takes the string ‘Kenneth Reitz is a cool guy’, and
|
||
searches for ‘kenneth reitz’ inside of it. If ‘kenneth reitz’ is
|
||
found within string0 (re.match compares the expression with the
|
||
string), the script will print “True”, if not, it will print
|
||
“False”. Additional parameters can be passed to the re.match
|
||
function when needed. Note the ‘re.IGNORECASE’ flag used here –
|
||
This tells the function be case-insensitive. Once you master the
|
||
regular expression syntax, you’ll realize how truly powerful they
|
||
can be. The options become limitless and the usefulness becomes
|
||
undeniable. Here’s another example: import re</p>
|
||
<pre class="literal-block">
|
||
string0 = '10.03.1988'
|
||
regExp = r'^\d\d[./]\d\d[./]\d\d\d\d?$'
|
||
|
||
if re.match(regExp, string0):
|
||
print 'True'
|
||
else:
|
||
print 'False/
|
||
</pre>
|
||
<p>When run, this script prints out “True”. If we were to change
|
||
string0 to ‘10.03.88’, it would print “False”. Simple, isn’t it?
|
||
Now, while a True/False return could be useful in certain
|
||
applications (i.e. form validation), most of the time, we’re going
|
||
to want to have a bit more information in order for our checks to
|
||
be useful. We can tell Python to show us the data that matches our
|
||
query. To do this, we’re going to have to break our expression up
|
||
into different groups. In the date we have defined, there are three
|
||
obvious groups we could separate this into: the day, month, and
|
||
year. While defining a Regular Expression, you can use parentheses
|
||
‘()’ to define groups: regExp = r’^()././$’</p>
|
||
<p>This separates our expression into 3 separate groups. Python also
|
||
supports turning a Regular Expression string into an
|
||
heavily-supported object with the re.compile() function. Once you
|
||
define a string as a Regular Expression object, you can use the
|
||
built in methods to preform powerful parsing. Now we can ask python
|
||
what is in those groups: import restring0 = ‘10.03.1988’ regExp =
|
||
re.compile(‘^()././$’) regExpMatches = regExp.match(string0)</p>
|
||
<pre class="literal-block">
|
||
if re.match(regExp, string0):
|
||
print(“Day: %s\nMonth: %s\nYear: %s” % (regExpMatches.group(1), \
|
||
regExpMatches.group(2), regExpMatches.group(3)))
|
||
else:
|
||
print(“Invalid Date.”)
|
||
</pre>
|
||
<p>When executed, this script parses through our validated date,
|
||
breaks it down into groups, and prints the following: > Day: 10 >
|
||
Month: 03 > Year: 1988</p>
|
||
<p>The possibilities are limitless! Here’s a quick run-down of the re
|
||
module’s functions, strait from the Python documentation for
|
||
reference: match: Match a regular expression pattern to the
|
||
beginning of a string. search: Search a string for the presence of
|
||
a pattern. sub: Substitute occurrences of a pattern found in a
|
||
string subn: Same as sub, but also return the number of
|
||
substitutions made. split: Split a string by the occurrences of a
|
||
pattern. findall: Find all occurrences of a pattern in a string.
|
||
compile: Compile a pattern into a RegexObject. purge: Clear the
|
||
regular expression cache. escape: Backslash all non-alphanumerics
|
||
in a string.</p>
|
||
<p>Remember, you can always type help(re) (after importing the re
|
||
module) into the Python interpret to take a quick look at the
|
||
module’s built-in documentation. Good luck and happy coding!</p>
|
||
|
||
</div><!-- /.entry-content -->
|
||
|
||
|
||
|
||
<div class="comments">
|
||
<div id="disqus_thread"></div>
|
||
<script type="text/javascript">
|
||
var disqus_identifier = "python-regular-expressions.html";
|
||
(function() {
|
||
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
|
||
dsq.src = 'http://kennethreitz.disqus.com/embed.js';
|
||
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
|
||
})();
|
||
</script>
|
||
</div>
|
||
|
||
|
||
|
||
</article>
|
||
</section>
|
||
|
||
|
||
<section id="extras" class="body">
|
||
|
||
|
||
<div class="blogroll">
|
||
<h2>Links</h2>
|
||
<ul>
|
||
|
||
<li><a href="http://github.com/kennethreitz">GitHub Repos</a></li>
|
||
|
||
<li><a href="http://flickr.com/kennethreitz">Photography (Flickr)</a></li>
|
||
|
||
<li><a href="http://twitter.com/kennethreitz">Latest Tweets</a></li>
|
||
|
||
<li><a href="http://www.linkedin.com/in/kennethreitz">Résumé</a></li>
|
||
|
||
<li><a href="http://pick.im/kenneth-reitz">Design Portfolio</a></li>
|
||
|
||
<li><a href="http://laterstars.com/kennethreitz">Later Stars</a></li>
|
||
|
||
</ul>
|
||
</div><!-- /.blogroll -->
|
||
|
||
|
||
|
||
<div class="social">
|
||
|
||
<ul>
|
||
<li><a href="./feeds/all.atom.xml" rel="alternate">atom feed</a></li>
|
||
|
||
|
||
|
||
<li><a href="http://facebook.com/kennethreitz">Facebook</a></li>
|
||
|
||
</ul>
|
||
</div><!-- /.social -->
|
||
|
||
|
||
</section><!-- /#extras -->
|
||
|
||
<footer id="contentinfo" class="body">
|
||
<address id="about" class="vcard body">
|
||
© 2011 Kenneth Reitz & co. All Rights Reserved.
|
||
</address><!-- /#about -->
|
||
|
||
</footer><!-- /#contentinfo -->
|
||
|
||
|
||
|
||
<script type="text/javascript">
|
||
var disqus_shortname = 'kennethreitz';
|
||
(function () {
|
||
var s = document.createElement('script'); s.async = true;
|
||
s.type = 'text/javascript';
|
||
s.src = 'http://' + disqus_shortname + '.disqus.com/count.js';
|
||
(document.getElementsByTagName('HEAD')[0] || document.getElementsByTagName('BODY')[0]).appendChild(s);
|
||
}());
|
||
</script>
|
||
<script type="text/javascript" charset="utf-8">
|
||
var disqus_developer = 1;
|
||
</script>
|
||
|
||
</body>
|
||
</html> |