Merge pull request #257 from rgbkrk/editing_on_the_plane

Editing on the plane
This commit is contained in:
Kenneth Reitz
2013-03-22 11:00:27 -07:00
11 changed files with 145 additions and 132 deletions
+1 -1
View File
@@ -104,7 +104,7 @@ The following command lists all available minions running CentOS using the grain
Salt also provides a state system. States can be used to configure the minion hosts.
For example, when a minion host is ordered to read the following state file, will install
For example, when a minion host is ordered to read the following state file, it will install
and start the Apache server:
.. code-block:: yaml
+1 -1
View File
@@ -6,4 +6,4 @@ Command Line Applications
Clint
-----
.. todo:: Write about Clint
.. todo:: Write about Clint
+6
View File
@@ -41,3 +41,9 @@ messaging library aimed at use in scalable distributed or concurrent
applications. It provides a message queue, but unlike message-oriented
middleware, a ØMQ system can run without a dedicated message broker. The
library is designed to have a familiar socket-style API.
RabbitMQ
--------
.. todo:: Write about RabbitMQ
+1 -2
View File
@@ -30,7 +30,6 @@ Django ORM
The Django ORM is the interface used by `Django <http://www.djangoproject.com>`_
to provide database access.
It's based on the idea of models, an abstraction that makes it easier to
It's based on the idea of `models <https://docs.djangoproject.com/en/1.3/#the-model-layer>`_, an abstraction that makes it easier to
manipulate data in Python.
Documentation can be found `here <https://docs.djangoproject.com/en/1.3/#the-model-layer>`_
+3 -3
View File
@@ -41,7 +41,7 @@ Gtk
PyGTK provides Python bindings for the GTK+ toolkit. Like the GTK+ library
itself, it is currently licensed under the GNU LGPL. It is worth noting that
PyGTK only currently supports the Gtk-2.X API (NOT Gtk-3.0). It is currently
recommended that PyGTK is not used for new projects and existing applications
recommended that PyGTK not be used for new projects and existing applications
be ported from PyGTK to PyGObject.
Tk
@@ -60,10 +60,10 @@ available on the `Python Wiki <http://wiki.python.org/moin/TkInter>`_.
Kivy
----
Kivy is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable.
`Kivy <http://kivy.org>`_ is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable.
Kivy is written in Python, based on OpenGL and supports different input devices such as: Mouse, Dual Mouse, TUIO, WiiMote, WM_TOUCH, HIDtouch, Apple's products and so on.
Kivy is actively being developed by a community and free to use. It operates on all major platforms (Linux, OSX, Windows, Android).
The main resource for information is the website: http://kivy.org
The main resource for information is the website: http://kivy.org
+2 -2
View File
@@ -12,7 +12,7 @@ The `Python Imaging Library <http://www.pythonware.com/products/pil/>`_, or PIL
for short, is *the* library for image manipulation in Python.
It works with Python 1.5.2 and above, including 2.5, 2.6 and 2.7. Unfortunately,
it doesn't work with 3.0+ yet.
it doesn't work with 3.0+ yet.
Installation
~~~~~~~~~~~~
@@ -20,7 +20,7 @@ Installation
PIL has a reputation of not being very straightforward to install. Listed below
are installation notes on various systems.
Also, there's a fork named `Pillow <http://pypi.python.org/pypi/Pillow>`_ which is easier
Also, there's a fork named `Pillow <http://pypi.python.org/pypi/Pillow>`_ which is easier
to install. It has good setup instructions for all platforms.
Installing on Linux
+3 -3
View File
@@ -6,7 +6,7 @@ Twisted
`Twisted <http://twistedmatrix.com/trac/>`_ is an event-driven networking engine. It can be
used to build applications around many different networking protocols, including http servers
and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and
and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and
`many more <http://twistedmatrix.com/trac/wiki/Documentation>`_.
PyZMQ
@@ -14,11 +14,11 @@ PyZMQ
`PyZMQ <http://zeromq.github.com/pyzmq/>`_ is the Python binding for `ZeroMQ <http://www.zeromq.org/>`_,
which is a high-performance asynchronous messaging library. One great advantage is that ZeroMQ
can be used for message queuing without message broker. The basic patterns for this are:
can be used for message queuing without a message broker. The basic patterns for this are:
- request-reply: connects a set of clients to a set of services. This is a remote procedure call
and task distribution pattern.
- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data
- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data
distribution pattern.
- push-pull (or pipeline): connects nodes in a fan-out / fan-in pattern that can have multiple
steps, and loops. This is a parallel task distribution and collection pattern.
+9 -4
View File
@@ -35,6 +35,10 @@ people who only need the basic requirements can just use NumPy.
NumPy is compatible with Python versions 2.4 through to 2.7.2 and 3.1+.
Numba
-----
.. todo:: Write about Numba
SciPy
-----
@@ -60,8 +64,9 @@ Resources
Installation of scientific Python packages can be troublesome. Many of these
packages are implemented as Python C extensions which need to be compiled.
This section lists various so-called Python distributions which provide precompiled and
easy-to-install collections of scientific Python packages.
This section lists various so-called scientific Python distributions which
provide precompiled and easy-to-install collections of scientific Python
packages.
Unofficial Windows Binaries for Python Extension Packages
---------------------------------------------------------
@@ -91,6 +96,6 @@ Anaconda
Python Distribution <https://store.continuum.io/cshop/anaconda>`_ which
includes all the common scientific python packages and additionally many
packages related to data analytics and big data. Anaconda comes in two
flavours, a paid for version and a completely free and open source community
flavors, a paid for version and a completely free and open source community
edition, Anaconda CE, which contains a slightly reduced feature set. Free
licences for the paid-for version are available for academics and researchers.
licenses for the paid-for version are available for academics and researchers.
+101 -99
View File
@@ -1,99 +1,101 @@
HTML Scraping
=============
Web Scraping
------------
Web sites are written using HTML, which means that each web page is a
structured document. Sometimes it would be great to obtain some data from
them and preserve the structure while we're at it. Web sites provide
don't always provide their data in comfortable formats such as ``.csv``.
This is where web scraping comes in. Web scraping is the practice of using a
computer program to sift through a web page and gather the data that you need
in a format most useful to you while at the same time preserving the structure
of the data.
lxml and Requests
-----------------
`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
XML and HTML documents really fast. It even handles messed up tags. We will
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_ module instead of the already built-in urlib2
due to improvements in speed and readability. You can easily install both
using ``pip install lxml`` and ``pip install requests``.
Lets start with the imports:
.. code-block:: python
from lxml import html
import requests
Next we will use ``requests.get`` to retrieve the web page with our data
and parse it using the ``html`` module and save the results in ``tree``:
.. code-block:: python
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.text)
``tree`` now contains the whole HTML file in a nice tree structure which
we can go over two different ways: XPath and CSSSelect. In this example, I
will focus on the former.
XPath is a way of locating information in structured documents such as
HTML or XML documents. A good introduction to XPath is on `W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .
There are also various tools for obtaining the XPath of elements such as
FireBug for Firefox or if you're using Chrome you can right click an
element, choose 'Inspect element', highlight the code and then right
click again and choose 'Copy XPath'.
After a quick analysis, we see that in our page the data is contained in
two elements - one is a div with title 'buyer-name' and the other is a
span with class 'item-price':
::
<div title="buyer-name">Carson Busses</div>
<span class="item-price">$29.95</span>
Knowing this we can create the correct XPath query and use the lxml
``xpath`` function like this:
.. code-block:: python
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
Lets see what we got exactly:
.. code-block:: python
print 'Buyers: ', buyers
print 'Prices: ', prices
::
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes',
'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff',
'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup',
'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire',
'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25',
'$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11',
'$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68',
'$15.00', '$114.07', '$10.09']
Congratulations! We have successfully scraped all the data we wanted from
a web page using lxml and Requests. We have it stored in memory as two
lists. Now we can do all sorts of cool stuff with it: we can analyze it
using Python or we can save it a file and share it with the world.
A cool idea to think about is modifying this script to iterate through
the rest of the pages of this example dataset or rewriting this
application to use threads for improved speed.
HTML Scraping
=============
Web Scraping
------------
Web sites are written using HTML, which means that each web page is a
structured document. Sometimes it would be great to obtain some data from
them and preserve the structure while we're at it. Web sites don't always
provide their data in comfortable formats such as ``csv`` or ``json``.
This is where web scraping comes in. Web scraping is the practice of using a
computer program to sift through a web page and gather the data that you need
in a format most useful to you while at the same time preserving the structure
of the data.
lxml and Requests
-----------------
`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
XML and HTML documents really fast. It even handles messed up tags. We will
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_
module instead of the already built-in urlib2 due to improvements in speed and
readability. You can easily install both using ``pip install lxml`` and
``pip install requests``.
Lets start with the imports:
.. code-block:: python
from lxml import html
import requests
Next we will use ``requests.get`` to retrieve the web page with our data
and parse it using the ``html`` module and save the results in ``tree``:
.. code-block:: python
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.text)
``tree`` now contains the whole HTML file in a nice tree structure which
we can go over two different ways: XPath and CSSSelect. In this example, I
will focus on the former.
XPath is a way of locating information in structured documents such as
HTML or XML documents. A good introduction to XPath is on
`W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .
There are also various tools for obtaining the XPath of elements such as
FireBug for Firefox or the Chrome Inspector. If you're using Chrome, you
can right click an element, choose 'Inspect element', highlight the code,
right click again and choose 'Copy XPath'.
After a quick analysis, we see that in our page the data is contained in
two elements - one is a div with title 'buyer-name' and the other is a
span with class 'item-price':
::
<div title="buyer-name">Carson Busses</div>
<span class="item-price">$29.95</span>
Knowing this we can create the correct XPath query and use the lxml
``xpath`` function like this:
.. code-block:: python
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')
Lets see what we got exactly:
.. code-block:: python
print 'Buyers: ', buyers
print 'Prices: ', prices
::
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes',
'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff',
'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup',
'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire',
'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25',
'$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11',
'$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68',
'$15.00', '$114.07', '$10.09']
Congratulations! We have successfully scraped all the data we wanted from
a web page using lxml and Requests. We have it stored in memory as two
lists. Now we can do all sorts of cool stuff with it: we can analyze it
using Python or we can save it to a file and share it with the world.
A cool idea to think about is modifying this script to iterate through
the rest of the pages of this example dataset or rewriting this
application to use threads for improved speed.
+7 -5
View File
@@ -42,7 +42,7 @@ The GIL
`The GIL`_ (Global Interpreter Lock) is how Python allows multiple threads to
operate at the same time. Python's memory management isn't entirely thread-safe,
so the GIL is required to prevents multiple threads from running the same
so the GIL is required to prevent multiple threads from running the same
Python code at once.
David Beazley has a great `guide`_ on how the GIL operates. He also covers the
@@ -58,8 +58,8 @@ C Extensions
The GIL
-------
`Special care`_ must be taken when writing C extensions to make sure you r
egister your threads with the interpreter.
`Special care`_ must be taken when writing C extensions to make sure you
register your threads with the interpreter.
C Extensions
::::::::::::
@@ -76,7 +76,9 @@ Pyrex
Shedskin?
---------
Numba
-----
.. todo:: Write about Numba and the autojit compiler for NumPy
Threading
:::::::::
@@ -86,7 +88,7 @@ Threading
---------
Spanwing Processes
Spawning Processes
------------------
+11 -12
View File
@@ -98,12 +98,12 @@ framework like Django and the microframeworks: It comes with a lot of libraries
and functionality and can thus not be considered lightweight. On the other
hand, it does not provide all the functionality Django does. Instead Pyramid
brings basic support for most regular tasks and provides a great deal of
extensibility. Additionally, Pyramid has a huge focus on complete
extensibility. Additionally, Pyramid has a huge focus on complete
`documentation <http://docs.pylonsproject.org/en/latest/docs/pyramid.html>`_. As
a little extra it comes with the Werkzeug Debugger which allows you to debug a
running web application in the browser.
**Support** can also be found in the
**Support** can also be found in the
`documentation <http://docs.pylonsproject.org/en/latest/index.html#support-desc>`_.
@@ -140,8 +140,8 @@ Gunicorn
to serve Python applications. It is a Python interpretation of the Ruby
`Unicorn <http://unicorn.bogomips.org/>`_ server. Unicorn is designed to be
lightweight, easy to use, and uses many UNIX idioms. Gunicorn is not designed
to face the internet, in fact it was designed to run behind Nginx which buffers
slow requests, and takes care of other important considerations. A sample
to face the internet -- it was designed to run behind Nginx which buffers
slow requests and takes care of other important considerations. A sample
setup for Nginx + gUnicorn can be found in the
`Gunicorn help <http://gunicorn.org/deploy.html>`_.
@@ -189,7 +189,7 @@ support for Python 2.7 applications.
Heroku allows you to run as many Python web applications as you like, 24/7 and
free of charge. Heroku is best described as a horizontal scaling platform. They
start to charge you once you "scale" you application to run on more than one
start to charge you once you "scale" your application to run on more than one
Dyno (abstracted servers) at a time.
Heroku publishes `step-by-step instructions
@@ -202,10 +202,9 @@ DotCloud
~~~~~~~~
`DotCloud <http://www.dotcloud.com/>`_ supports WSGI applications and
background/worker tasks natively on their platform. Web applications running
Python version 2.6, and uses :ref:`nginx <nginx-ref>` and :ref:`uWSGI
<uwsgi-ref>`, and allows custom configuration of both
for advanced users.
background/worker tasks natively on their platform. Web applications run
Python version 2.6, use :ref:`nginx <nginx-ref>` and :ref:`uWSGI
<uwsgi-ref>`, and allow custom configuration of both for advanced users.
DotCloud uses a custom command-line API client which can work with
applications managed in git repositories or any other version control
@@ -222,7 +221,7 @@ getting started.
Gondor
~~~~~~
`Gondor <https://gondor.io/>`_ is a PaaS specailized for deploying Django
`Gondor <https://gondor.io/>`_ is a PaaS specialized for deploying Django
and Pinax applications. Gondor supports Django versions 1.2 and 1.3 on
Python version 2.7, and can automatically configure your Django site if you
use ``local_settings.py`` for site-specific configuration information.
@@ -238,7 +237,7 @@ Templating
Most WSGI applications are responding to HTTP requests to serve
content in HTML or other markup languages. Instead of generating directly
textual content from Python, the concept of separation of concerns
advises us to use templates. A template engine manage a suite of
advises us to use templates. A template engine manages a suite of
template files, with a system of hierarchy and inclusion to
avoid unnecessary repetition, and is in charge of rendering
(generating) the actual content, filling the static content
@@ -265,7 +264,7 @@ and to the templates themselves.
templates. This convenience can lead to uncontrolled
increase in complexity, and often harder to find bugs.
- It is often possible or necessary to mix javascript templates with
- It is often necessary to mix javascript templates with
HTML templates. A sane approach to this design is to isolate
the parts where the HTML template passes some variable content
to the javascript code.