mirror of
https://github.com/kennethreitz/python-guide.git
synced 2026-06-05 23:00:18 +00:00
Merge pull request #257 from rgbkrk/editing_on_the_plane
Editing on the plane
This commit is contained in:
@@ -104,7 +104,7 @@ The following command lists all available minions running CentOS using the grain
|
||||
|
||||
Salt also provides a state system. States can be used to configure the minion hosts.
|
||||
|
||||
For example, when a minion host is ordered to read the following state file, will install
|
||||
For example, when a minion host is ordered to read the following state file, it will install
|
||||
and start the Apache server:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
@@ -6,4 +6,4 @@ Command Line Applications
|
||||
Clint
|
||||
-----
|
||||
|
||||
.. todo:: Write about Clint
|
||||
.. todo:: Write about Clint
|
||||
|
||||
@@ -41,3 +41,9 @@ messaging library aimed at use in scalable distributed or concurrent
|
||||
applications. It provides a message queue, but unlike message-oriented
|
||||
middleware, a ØMQ system can run without a dedicated message broker. The
|
||||
library is designed to have a familiar socket-style API.
|
||||
|
||||
RabbitMQ
|
||||
--------
|
||||
|
||||
.. todo:: Write about RabbitMQ
|
||||
|
||||
|
||||
@@ -30,7 +30,6 @@ Django ORM
|
||||
The Django ORM is the interface used by `Django <http://www.djangoproject.com>`_
|
||||
to provide database access.
|
||||
|
||||
It's based on the idea of models, an abstraction that makes it easier to
|
||||
It's based on the idea of `models <https://docs.djangoproject.com/en/1.3/#the-model-layer>`_, an abstraction that makes it easier to
|
||||
manipulate data in Python.
|
||||
|
||||
Documentation can be found `here <https://docs.djangoproject.com/en/1.3/#the-model-layer>`_
|
||||
@@ -41,7 +41,7 @@ Gtk
|
||||
PyGTK provides Python bindings for the GTK+ toolkit. Like the GTK+ library
|
||||
itself, it is currently licensed under the GNU LGPL. It is worth noting that
|
||||
PyGTK only currently supports the Gtk-2.X API (NOT Gtk-3.0). It is currently
|
||||
recommended that PyGTK is not used for new projects and existing applications
|
||||
recommended that PyGTK not be used for new projects and existing applications
|
||||
be ported from PyGTK to PyGObject.
|
||||
|
||||
Tk
|
||||
@@ -60,10 +60,10 @@ available on the `Python Wiki <http://wiki.python.org/moin/TkInter>`_.
|
||||
|
||||
Kivy
|
||||
----
|
||||
Kivy is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable.
|
||||
`Kivy <http://kivy.org>`_ is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable.
|
||||
|
||||
Kivy is written in Python, based on OpenGL and supports different input devices such as: Mouse, Dual Mouse, TUIO, WiiMote, WM_TOUCH, HIDtouch, Apple's products and so on.
|
||||
|
||||
Kivy is actively being developed by a community and free to use. It operates on all major platforms (Linux, OSX, Windows, Android).
|
||||
|
||||
The main resource for information is the website: http://kivy.org
|
||||
The main resource for information is the website: http://kivy.org
|
||||
|
||||
@@ -12,7 +12,7 @@ The `Python Imaging Library <http://www.pythonware.com/products/pil/>`_, or PIL
|
||||
for short, is *the* library for image manipulation in Python.
|
||||
|
||||
It works with Python 1.5.2 and above, including 2.5, 2.6 and 2.7. Unfortunately,
|
||||
it doesn't work with 3.0+ yet.
|
||||
it doesn't work with 3.0+ yet.
|
||||
|
||||
Installation
|
||||
~~~~~~~~~~~~
|
||||
@@ -20,7 +20,7 @@ Installation
|
||||
PIL has a reputation of not being very straightforward to install. Listed below
|
||||
are installation notes on various systems.
|
||||
|
||||
Also, there's a fork named `Pillow <http://pypi.python.org/pypi/Pillow>`_ which is easier
|
||||
Also, there's a fork named `Pillow <http://pypi.python.org/pypi/Pillow>`_ which is easier
|
||||
to install. It has good setup instructions for all platforms.
|
||||
|
||||
Installing on Linux
|
||||
|
||||
@@ -6,7 +6,7 @@ Twisted
|
||||
|
||||
`Twisted <http://twistedmatrix.com/trac/>`_ is an event-driven networking engine. It can be
|
||||
used to build applications around many different networking protocols, including http servers
|
||||
and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and
|
||||
and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and
|
||||
`many more <http://twistedmatrix.com/trac/wiki/Documentation>`_.
|
||||
|
||||
PyZMQ
|
||||
@@ -14,11 +14,11 @@ PyZMQ
|
||||
|
||||
`PyZMQ <http://zeromq.github.com/pyzmq/>`_ is the Python binding for `ZeroMQ <http://www.zeromq.org/>`_,
|
||||
which is a high-performance asynchronous messaging library. One great advantage is that ZeroMQ
|
||||
can be used for message queuing without message broker. The basic patterns for this are:
|
||||
can be used for message queuing without a message broker. The basic patterns for this are:
|
||||
|
||||
- request-reply: connects a set of clients to a set of services. This is a remote procedure call
|
||||
and task distribution pattern.
|
||||
- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data
|
||||
- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data
|
||||
distribution pattern.
|
||||
- push-pull (or pipeline): connects nodes in a fan-out / fan-in pattern that can have multiple
|
||||
steps, and loops. This is a parallel task distribution and collection pattern.
|
||||
|
||||
@@ -35,6 +35,10 @@ people who only need the basic requirements can just use NumPy.
|
||||
|
||||
NumPy is compatible with Python versions 2.4 through to 2.7.2 and 3.1+.
|
||||
|
||||
Numba
|
||||
-----
|
||||
.. todo:: Write about Numba
|
||||
|
||||
SciPy
|
||||
-----
|
||||
|
||||
@@ -60,8 +64,9 @@ Resources
|
||||
|
||||
Installation of scientific Python packages can be troublesome. Many of these
|
||||
packages are implemented as Python C extensions which need to be compiled.
|
||||
This section lists various so-called Python distributions which provide precompiled and
|
||||
easy-to-install collections of scientific Python packages.
|
||||
This section lists various so-called scientific Python distributions which
|
||||
provide precompiled and easy-to-install collections of scientific Python
|
||||
packages.
|
||||
|
||||
Unofficial Windows Binaries for Python Extension Packages
|
||||
---------------------------------------------------------
|
||||
@@ -91,6 +96,6 @@ Anaconda
|
||||
Python Distribution <https://store.continuum.io/cshop/anaconda>`_ which
|
||||
includes all the common scientific python packages and additionally many
|
||||
packages related to data analytics and big data. Anaconda comes in two
|
||||
flavours, a paid for version and a completely free and open source community
|
||||
flavors, a paid for version and a completely free and open source community
|
||||
edition, Anaconda CE, which contains a slightly reduced feature set. Free
|
||||
licences for the paid-for version are available for academics and researchers.
|
||||
licenses for the paid-for version are available for academics and researchers.
|
||||
|
||||
+101
-99
@@ -1,99 +1,101 @@
|
||||
HTML Scraping
|
||||
=============
|
||||
|
||||
Web Scraping
|
||||
------------
|
||||
|
||||
Web sites are written using HTML, which means that each web page is a
|
||||
structured document. Sometimes it would be great to obtain some data from
|
||||
them and preserve the structure while we're at it. Web sites provide
|
||||
don't always provide their data in comfortable formats such as ``.csv``.
|
||||
|
||||
This is where web scraping comes in. Web scraping is the practice of using a
|
||||
computer program to sift through a web page and gather the data that you need
|
||||
in a format most useful to you while at the same time preserving the structure
|
||||
of the data.
|
||||
|
||||
lxml and Requests
|
||||
-----------------
|
||||
|
||||
`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
|
||||
XML and HTML documents really fast. It even handles messed up tags. We will
|
||||
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_ module instead of the already built-in urlib2
|
||||
due to improvements in speed and readability. You can easily install both
|
||||
using ``pip install lxml`` and ``pip install requests``.
|
||||
|
||||
Lets start with the imports:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from lxml import html
|
||||
import requests
|
||||
|
||||
Next we will use ``requests.get`` to retrieve the web page with our data
|
||||
and parse it using the ``html`` module and save the results in ``tree``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
|
||||
tree = html.fromstring(page.text)
|
||||
|
||||
``tree`` now contains the whole HTML file in a nice tree structure which
|
||||
we can go over two different ways: XPath and CSSSelect. In this example, I
|
||||
will focus on the former.
|
||||
|
||||
XPath is a way of locating information in structured documents such as
|
||||
HTML or XML documents. A good introduction to XPath is on `W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .
|
||||
|
||||
There are also various tools for obtaining the XPath of elements such as
|
||||
FireBug for Firefox or if you're using Chrome you can right click an
|
||||
element, choose 'Inspect element', highlight the code and then right
|
||||
click again and choose 'Copy XPath'.
|
||||
|
||||
After a quick analysis, we see that in our page the data is contained in
|
||||
two elements - one is a div with title 'buyer-name' and the other is a
|
||||
span with class 'item-price':
|
||||
|
||||
::
|
||||
|
||||
<div title="buyer-name">Carson Busses</div>
|
||||
<span class="item-price">$29.95</span>
|
||||
|
||||
Knowing this we can create the correct XPath query and use the lxml
|
||||
``xpath`` function like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
#This will create a list of buyers:
|
||||
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
|
||||
#This will create a list of prices
|
||||
prices = tree.xpath('//span[@class="item-price"]/text()')
|
||||
|
||||
Lets see what we got exactly:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
print 'Buyers: ', buyers
|
||||
print 'Prices: ', prices
|
||||
|
||||
::
|
||||
|
||||
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes',
|
||||
'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff',
|
||||
'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup',
|
||||
'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire',
|
||||
'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
|
||||
|
||||
Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25',
|
||||
'$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11',
|
||||
'$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68',
|
||||
'$15.00', '$114.07', '$10.09']
|
||||
|
||||
Congratulations! We have successfully scraped all the data we wanted from
|
||||
a web page using lxml and Requests. We have it stored in memory as two
|
||||
lists. Now we can do all sorts of cool stuff with it: we can analyze it
|
||||
using Python or we can save it a file and share it with the world.
|
||||
|
||||
A cool idea to think about is modifying this script to iterate through
|
||||
the rest of the pages of this example dataset or rewriting this
|
||||
application to use threads for improved speed.
|
||||
HTML Scraping
|
||||
=============
|
||||
|
||||
Web Scraping
|
||||
------------
|
||||
|
||||
Web sites are written using HTML, which means that each web page is a
|
||||
structured document. Sometimes it would be great to obtain some data from
|
||||
them and preserve the structure while we're at it. Web sites don't always
|
||||
provide their data in comfortable formats such as ``csv`` or ``json``.
|
||||
|
||||
This is where web scraping comes in. Web scraping is the practice of using a
|
||||
computer program to sift through a web page and gather the data that you need
|
||||
in a format most useful to you while at the same time preserving the structure
|
||||
of the data.
|
||||
|
||||
lxml and Requests
|
||||
-----------------
|
||||
|
||||
`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
|
||||
XML and HTML documents really fast. It even handles messed up tags. We will
|
||||
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_
|
||||
module instead of the already built-in urlib2 due to improvements in speed and
|
||||
readability. You can easily install both using ``pip install lxml`` and
|
||||
``pip install requests``.
|
||||
|
||||
Lets start with the imports:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from lxml import html
|
||||
import requests
|
||||
|
||||
Next we will use ``requests.get`` to retrieve the web page with our data
|
||||
and parse it using the ``html`` module and save the results in ``tree``:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
|
||||
tree = html.fromstring(page.text)
|
||||
|
||||
``tree`` now contains the whole HTML file in a nice tree structure which
|
||||
we can go over two different ways: XPath and CSSSelect. In this example, I
|
||||
will focus on the former.
|
||||
|
||||
XPath is a way of locating information in structured documents such as
|
||||
HTML or XML documents. A good introduction to XPath is on
|
||||
`W3Schools <http://www.w3schools.com/xpath/default.asp>`_ .
|
||||
|
||||
There are also various tools for obtaining the XPath of elements such as
|
||||
FireBug for Firefox or the Chrome Inspector. If you're using Chrome, you
|
||||
can right click an element, choose 'Inspect element', highlight the code,
|
||||
right click again and choose 'Copy XPath'.
|
||||
|
||||
After a quick analysis, we see that in our page the data is contained in
|
||||
two elements - one is a div with title 'buyer-name' and the other is a
|
||||
span with class 'item-price':
|
||||
|
||||
::
|
||||
|
||||
<div title="buyer-name">Carson Busses</div>
|
||||
<span class="item-price">$29.95</span>
|
||||
|
||||
Knowing this we can create the correct XPath query and use the lxml
|
||||
``xpath`` function like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
#This will create a list of buyers:
|
||||
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
|
||||
#This will create a list of prices
|
||||
prices = tree.xpath('//span[@class="item-price"]/text()')
|
||||
|
||||
Lets see what we got exactly:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
print 'Buyers: ', buyers
|
||||
print 'Prices: ', prices
|
||||
|
||||
::
|
||||
|
||||
Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes',
|
||||
'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff',
|
||||
'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup',
|
||||
'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire',
|
||||
'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell']
|
||||
|
||||
Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25',
|
||||
'$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11',
|
||||
'$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68',
|
||||
'$15.00', '$114.07', '$10.09']
|
||||
|
||||
Congratulations! We have successfully scraped all the data we wanted from
|
||||
a web page using lxml and Requests. We have it stored in memory as two
|
||||
lists. Now we can do all sorts of cool stuff with it: we can analyze it
|
||||
using Python or we can save it to a file and share it with the world.
|
||||
|
||||
A cool idea to think about is modifying this script to iterate through
|
||||
the rest of the pages of this example dataset or rewriting this
|
||||
application to use threads for improved speed.
|
||||
|
||||
@@ -42,7 +42,7 @@ The GIL
|
||||
|
||||
`The GIL`_ (Global Interpreter Lock) is how Python allows multiple threads to
|
||||
operate at the same time. Python's memory management isn't entirely thread-safe,
|
||||
so the GIL is required to prevents multiple threads from running the same
|
||||
so the GIL is required to prevent multiple threads from running the same
|
||||
Python code at once.
|
||||
|
||||
David Beazley has a great `guide`_ on how the GIL operates. He also covers the
|
||||
@@ -58,8 +58,8 @@ C Extensions
|
||||
The GIL
|
||||
-------
|
||||
|
||||
`Special care`_ must be taken when writing C extensions to make sure you r
|
||||
egister your threads with the interpreter.
|
||||
`Special care`_ must be taken when writing C extensions to make sure you
|
||||
register your threads with the interpreter.
|
||||
|
||||
C Extensions
|
||||
::::::::::::
|
||||
@@ -76,7 +76,9 @@ Pyrex
|
||||
Shedskin?
|
||||
---------
|
||||
|
||||
|
||||
Numba
|
||||
-----
|
||||
.. todo:: Write about Numba and the autojit compiler for NumPy
|
||||
|
||||
Threading
|
||||
:::::::::
|
||||
@@ -86,7 +88,7 @@ Threading
|
||||
---------
|
||||
|
||||
|
||||
Spanwing Processes
|
||||
Spawning Processes
|
||||
------------------
|
||||
|
||||
|
||||
|
||||
+11
-12
@@ -98,12 +98,12 @@ framework like Django and the microframeworks: It comes with a lot of libraries
|
||||
and functionality and can thus not be considered lightweight. On the other
|
||||
hand, it does not provide all the functionality Django does. Instead Pyramid
|
||||
brings basic support for most regular tasks and provides a great deal of
|
||||
extensibility. Additionally, Pyramid has a huge focus on complete
|
||||
extensibility. Additionally, Pyramid has a huge focus on complete
|
||||
`documentation <http://docs.pylonsproject.org/en/latest/docs/pyramid.html>`_. As
|
||||
a little extra it comes with the Werkzeug Debugger which allows you to debug a
|
||||
running web application in the browser.
|
||||
|
||||
**Support** can also be found in the
|
||||
**Support** can also be found in the
|
||||
`documentation <http://docs.pylonsproject.org/en/latest/index.html#support-desc>`_.
|
||||
|
||||
|
||||
@@ -140,8 +140,8 @@ Gunicorn
|
||||
to serve Python applications. It is a Python interpretation of the Ruby
|
||||
`Unicorn <http://unicorn.bogomips.org/>`_ server. Unicorn is designed to be
|
||||
lightweight, easy to use, and uses many UNIX idioms. Gunicorn is not designed
|
||||
to face the internet, in fact it was designed to run behind Nginx which buffers
|
||||
slow requests, and takes care of other important considerations. A sample
|
||||
to face the internet -- it was designed to run behind Nginx which buffers
|
||||
slow requests and takes care of other important considerations. A sample
|
||||
setup for Nginx + gUnicorn can be found in the
|
||||
`Gunicorn help <http://gunicorn.org/deploy.html>`_.
|
||||
|
||||
@@ -189,7 +189,7 @@ support for Python 2.7 applications.
|
||||
|
||||
Heroku allows you to run as many Python web applications as you like, 24/7 and
|
||||
free of charge. Heroku is best described as a horizontal scaling platform. They
|
||||
start to charge you once you "scale" you application to run on more than one
|
||||
start to charge you once you "scale" your application to run on more than one
|
||||
Dyno (abstracted servers) at a time.
|
||||
|
||||
Heroku publishes `step-by-step instructions
|
||||
@@ -202,10 +202,9 @@ DotCloud
|
||||
~~~~~~~~
|
||||
|
||||
`DotCloud <http://www.dotcloud.com/>`_ supports WSGI applications and
|
||||
background/worker tasks natively on their platform. Web applications running
|
||||
Python version 2.6, and uses :ref:`nginx <nginx-ref>` and :ref:`uWSGI
|
||||
<uwsgi-ref>`, and allows custom configuration of both
|
||||
for advanced users.
|
||||
background/worker tasks natively on their platform. Web applications run
|
||||
Python version 2.6, use :ref:`nginx <nginx-ref>` and :ref:`uWSGI
|
||||
<uwsgi-ref>`, and allow custom configuration of both for advanced users.
|
||||
|
||||
DotCloud uses a custom command-line API client which can work with
|
||||
applications managed in git repositories or any other version control
|
||||
@@ -222,7 +221,7 @@ getting started.
|
||||
Gondor
|
||||
~~~~~~
|
||||
|
||||
`Gondor <https://gondor.io/>`_ is a PaaS specailized for deploying Django
|
||||
`Gondor <https://gondor.io/>`_ is a PaaS specialized for deploying Django
|
||||
and Pinax applications. Gondor supports Django versions 1.2 and 1.3 on
|
||||
Python version 2.7, and can automatically configure your Django site if you
|
||||
use ``local_settings.py`` for site-specific configuration information.
|
||||
@@ -238,7 +237,7 @@ Templating
|
||||
Most WSGI applications are responding to HTTP requests to serve
|
||||
content in HTML or other markup languages. Instead of generating directly
|
||||
textual content from Python, the concept of separation of concerns
|
||||
advises us to use templates. A template engine manage a suite of
|
||||
advises us to use templates. A template engine manages a suite of
|
||||
template files, with a system of hierarchy and inclusion to
|
||||
avoid unnecessary repetition, and is in charge of rendering
|
||||
(generating) the actual content, filling the static content
|
||||
@@ -265,7 +264,7 @@ and to the templates themselves.
|
||||
templates. This convenience can lead to uncontrolled
|
||||
increase in complexity, and often harder to find bugs.
|
||||
|
||||
- It is often possible or necessary to mix javascript templates with
|
||||
- It is often necessary to mix javascript templates with
|
||||
HTML templates. A sane approach to this design is to isolate
|
||||
the parts where the HTML template passes some variable content
|
||||
to the javascript code.
|
||||
|
||||
Reference in New Issue
Block a user