diff --git a/docs/scenarios/admin.rst b/docs/scenarios/admin.rst index 042f703..3397d66 100644 --- a/docs/scenarios/admin.rst +++ b/docs/scenarios/admin.rst @@ -104,7 +104,7 @@ The following command lists all available minions running CentOS using the grain Salt also provides a state system. States can be used to configure the minion hosts. -For example, when a minion host is ordered to read the following state file, will install +For example, when a minion host is ordered to read the following state file, it will install and start the Apache server: .. code-block:: yaml diff --git a/docs/scenarios/cli.rst b/docs/scenarios/cli.rst index cea7a3f..bc952c7 100644 --- a/docs/scenarios/cli.rst +++ b/docs/scenarios/cli.rst @@ -6,4 +6,4 @@ Command Line Applications Clint ----- -.. todo:: Write about Clint \ No newline at end of file +.. todo:: Write about Clint diff --git a/docs/scenarios/client.rst b/docs/scenarios/client.rst index e2d8202..b9cd402 100644 --- a/docs/scenarios/client.rst +++ b/docs/scenarios/client.rst @@ -41,3 +41,9 @@ messaging library aimed at use in scalable distributed or concurrent applications. It provides a message queue, but unlike message-oriented middleware, a ØMQ system can run without a dedicated message broker. The library is designed to have a familiar socket-style API. + +RabbitMQ +-------- + +.. todo:: Write about RabbitMQ + diff --git a/docs/scenarios/db.rst b/docs/scenarios/db.rst index 4f03058..d3c398f 100644 --- a/docs/scenarios/db.rst +++ b/docs/scenarios/db.rst @@ -30,7 +30,6 @@ Django ORM The Django ORM is the interface used by `Django `_ to provide database access. -It's based on the idea of models, an abstraction that makes it easier to +It's based on the idea of `models `_, an abstraction that makes it easier to manipulate data in Python. -Documentation can be found `here `_ \ No newline at end of file diff --git a/docs/scenarios/gui.rst b/docs/scenarios/gui.rst index d40ac7e..49dd0ad 100644 --- a/docs/scenarios/gui.rst +++ b/docs/scenarios/gui.rst @@ -41,7 +41,7 @@ Gtk PyGTK provides Python bindings for the GTK+ toolkit. Like the GTK+ library itself, it is currently licensed under the GNU LGPL. It is worth noting that PyGTK only currently supports the Gtk-2.X API (NOT Gtk-3.0). It is currently -recommended that PyGTK is not used for new projects and existing applications +recommended that PyGTK not be used for new projects and existing applications be ported from PyGTK to PyGObject. Tk @@ -60,10 +60,10 @@ available on the `Python Wiki `_. Kivy ---- -Kivy is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable. +`Kivy `_ is a Python library for development of multi-touch enabled media rich applications. The aim is to allow for quick and easy interaction design and rapid prototyping, while making your code reusable and deployable. Kivy is written in Python, based on OpenGL and supports different input devices such as: Mouse, Dual Mouse, TUIO, WiiMote, WM_TOUCH, HIDtouch, Apple's products and so on. Kivy is actively being developed by a community and free to use. It operates on all major platforms (Linux, OSX, Windows, Android). -The main resource for information is the website: http://kivy.org \ No newline at end of file +The main resource for information is the website: http://kivy.org diff --git a/docs/scenarios/imaging.rst b/docs/scenarios/imaging.rst index 8a15972..8defa0b 100644 --- a/docs/scenarios/imaging.rst +++ b/docs/scenarios/imaging.rst @@ -12,7 +12,7 @@ The `Python Imaging Library `_, or PIL for short, is *the* library for image manipulation in Python. It works with Python 1.5.2 and above, including 2.5, 2.6 and 2.7. Unfortunately, -it doesn't work with 3.0+ yet. +it doesn't work with 3.0+ yet. Installation ~~~~~~~~~~~~ @@ -20,7 +20,7 @@ Installation PIL has a reputation of not being very straightforward to install. Listed below are installation notes on various systems. -Also, there's a fork named `Pillow `_ which is easier +Also, there's a fork named `Pillow `_ which is easier to install. It has good setup instructions for all platforms. Installing on Linux diff --git a/docs/scenarios/network.rst b/docs/scenarios/network.rst index 0521089..b01c0ef 100644 --- a/docs/scenarios/network.rst +++ b/docs/scenarios/network.rst @@ -6,7 +6,7 @@ Twisted `Twisted `_ is an event-driven networking engine. It can be used to build applications around many different networking protocols, including http servers -and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and +and clients, applications using SMTP, POP3, IMAP or SSH protocols, instant messaging and `many more `_. PyZMQ @@ -14,11 +14,11 @@ PyZMQ `PyZMQ `_ is the Python binding for `ZeroMQ `_, which is a high-performance asynchronous messaging library. One great advantage is that ZeroMQ -can be used for message queuing without message broker. The basic patterns for this are: +can be used for message queuing without a message broker. The basic patterns for this are: - request-reply: connects a set of clients to a set of services. This is a remote procedure call and task distribution pattern. -- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data +- publish-subscribe: connects a set of publishers to a set of subscribers. This is a data distribution pattern. - push-pull (or pipeline): connects nodes in a fan-out / fan-in pattern that can have multiple steps, and loops. This is a parallel task distribution and collection pattern. diff --git a/docs/scenarios/scientific.rst b/docs/scenarios/scientific.rst index a8cddde..49d5c88 100644 --- a/docs/scenarios/scientific.rst +++ b/docs/scenarios/scientific.rst @@ -35,6 +35,10 @@ people who only need the basic requirements can just use NumPy. NumPy is compatible with Python versions 2.4 through to 2.7.2 and 3.1+. +Numba +----- +.. todo:: Write about Numba + SciPy ----- @@ -60,8 +64,9 @@ Resources Installation of scientific Python packages can be troublesome. Many of these packages are implemented as Python C extensions which need to be compiled. -This section lists various so-called Python distributions which provide precompiled and -easy-to-install collections of scientific Python packages. +This section lists various so-called scientific Python distributions which +provide precompiled and easy-to-install collections of scientific Python +packages. Unofficial Windows Binaries for Python Extension Packages --------------------------------------------------------- @@ -91,6 +96,6 @@ Anaconda Python Distribution `_ which includes all the common scientific python packages and additionally many packages related to data analytics and big data. Anaconda comes in two -flavours, a paid for version and a completely free and open source community +flavors, a paid for version and a completely free and open source community edition, Anaconda CE, which contains a slightly reduced feature set. Free -licences for the paid-for version are available for academics and researchers. +licenses for the paid-for version are available for academics and researchers. diff --git a/docs/scenarios/scrape.rst b/docs/scenarios/scrape.rst index 17a0281..b4f10b2 100644 --- a/docs/scenarios/scrape.rst +++ b/docs/scenarios/scrape.rst @@ -1,99 +1,101 @@ -HTML Scraping -============= - -Web Scraping ------------- - -Web sites are written using HTML, which means that each web page is a -structured document. Sometimes it would be great to obtain some data from -them and preserve the structure while we're at it. Web sites provide -don't always provide their data in comfortable formats such as ``.csv``. - -This is where web scraping comes in. Web scraping is the practice of using a -computer program to sift through a web page and gather the data that you need -in a format most useful to you while at the same time preserving the structure -of the data. - -lxml and Requests ------------------ - -`lxml `_ is a pretty extensive library written for parsing -XML and HTML documents really fast. It even handles messed up tags. We will -also be using the `Requests `_ module instead of the already built-in urlib2 -due to improvements in speed and readability. You can easily install both -using ``pip install lxml`` and ``pip install requests``. - -Lets start with the imports: - -.. code-block:: python - - from lxml import html - import requests - -Next we will use ``requests.get`` to retrieve the web page with our data -and parse it using the ``html`` module and save the results in ``tree``: - -.. code-block:: python - - page = requests.get('http://econpy.pythonanywhere.com/ex/001.html') - tree = html.fromstring(page.text) - -``tree`` now contains the whole HTML file in a nice tree structure which -we can go over two different ways: XPath and CSSSelect. In this example, I -will focus on the former. - -XPath is a way of locating information in structured documents such as -HTML or XML documents. A good introduction to XPath is on `W3Schools `_ . - -There are also various tools for obtaining the XPath of elements such as -FireBug for Firefox or if you're using Chrome you can right click an -element, choose 'Inspect element', highlight the code and then right -click again and choose 'Copy XPath'. - -After a quick analysis, we see that in our page the data is contained in -two elements - one is a div with title 'buyer-name' and the other is a -span with class 'item-price': - -:: - -
Carson Busses
- $29.95 - -Knowing this we can create the correct XPath query and use the lxml -``xpath`` function like this: - -.. code-block:: python - - #This will create a list of buyers: - buyers = tree.xpath('//div[@title="buyer-name"]/text()') - #This will create a list of prices - prices = tree.xpath('//span[@class="item-price"]/text()') - -Lets see what we got exactly: - -.. code-block:: python - - print 'Buyers: ', buyers - print 'Prices: ', prices - -:: - - Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', - 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', - 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', - 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', - 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell'] - - Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', - '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', - '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', - '$15.00', '$114.07', '$10.09'] - -Congratulations! We have successfully scraped all the data we wanted from -a web page using lxml and Requests. We have it stored in memory as two -lists. Now we can do all sorts of cool stuff with it: we can analyze it -using Python or we can save it a file and share it with the world. - -A cool idea to think about is modifying this script to iterate through -the rest of the pages of this example dataset or rewriting this -application to use threads for improved speed. +HTML Scraping +============= + +Web Scraping +------------ + +Web sites are written using HTML, which means that each web page is a +structured document. Sometimes it would be great to obtain some data from +them and preserve the structure while we're at it. Web sites don't always +provide their data in comfortable formats such as ``csv`` or ``json``. + +This is where web scraping comes in. Web scraping is the practice of using a +computer program to sift through a web page and gather the data that you need +in a format most useful to you while at the same time preserving the structure +of the data. + +lxml and Requests +----------------- + +`lxml `_ is a pretty extensive library written for parsing +XML and HTML documents really fast. It even handles messed up tags. We will +also be using the `Requests `_ +module instead of the already built-in urlib2 due to improvements in speed and +readability. You can easily install both using ``pip install lxml`` and +``pip install requests``. + +Lets start with the imports: + +.. code-block:: python + + from lxml import html + import requests + +Next we will use ``requests.get`` to retrieve the web page with our data +and parse it using the ``html`` module and save the results in ``tree``: + +.. code-block:: python + + page = requests.get('http://econpy.pythonanywhere.com/ex/001.html') + tree = html.fromstring(page.text) + +``tree`` now contains the whole HTML file in a nice tree structure which +we can go over two different ways: XPath and CSSSelect. In this example, I +will focus on the former. + +XPath is a way of locating information in structured documents such as +HTML or XML documents. A good introduction to XPath is on +`W3Schools `_ . + +There are also various tools for obtaining the XPath of elements such as +FireBug for Firefox or the Chrome Inspector. If you're using Chrome, you +can right click an element, choose 'Inspect element', highlight the code, +right click again and choose 'Copy XPath'. + +After a quick analysis, we see that in our page the data is contained in +two elements - one is a div with title 'buyer-name' and the other is a +span with class 'item-price': + +:: + +
Carson Busses
+ $29.95 + +Knowing this we can create the correct XPath query and use the lxml +``xpath`` function like this: + +.. code-block:: python + + #This will create a list of buyers: + buyers = tree.xpath('//div[@title="buyer-name"]/text()') + #This will create a list of prices + prices = tree.xpath('//span[@class="item-price"]/text()') + +Lets see what we got exactly: + +.. code-block:: python + + print 'Buyers: ', buyers + print 'Prices: ', prices + +:: + + Buyers: ['Carson Busses', 'Earl E. Byrd', 'Patty Cakes', + 'Derri Anne Connecticut', 'Moe Dess', 'Leda Doggslife', 'Dan Druff', + 'Al Fresco', 'Ido Hoe', 'Howie Kisses', 'Len Lease', 'Phil Meup', + 'Ira Pent', 'Ben D. Rules', 'Ave Sectomy', 'Gary Shattire', + 'Bobbi Soks', 'Sheila Takya', 'Rose Tattoo', 'Moe Tell'] + + Prices: ['$29.95', '$8.37', '$15.26', '$19.25', '$19.25', + '$13.99', '$31.57', '$8.49', '$14.47', '$15.86', '$11.11', + '$15.98', '$16.27', '$7.50', '$50.85', '$14.26', '$5.68', + '$15.00', '$114.07', '$10.09'] + +Congratulations! We have successfully scraped all the data we wanted from +a web page using lxml and Requests. We have it stored in memory as two +lists. Now we can do all sorts of cool stuff with it: we can analyze it +using Python or we can save it to a file and share it with the world. + +A cool idea to think about is modifying this script to iterate through +the rest of the pages of this example dataset or rewriting this +application to use threads for improved speed. diff --git a/docs/scenarios/speed.rst b/docs/scenarios/speed.rst index 87a1f66..6aaa96b 100644 --- a/docs/scenarios/speed.rst +++ b/docs/scenarios/speed.rst @@ -42,7 +42,7 @@ The GIL `The GIL`_ (Global Interpreter Lock) is how Python allows multiple threads to operate at the same time. Python's memory management isn't entirely thread-safe, -so the GIL is required to prevents multiple threads from running the same +so the GIL is required to prevent multiple threads from running the same Python code at once. David Beazley has a great `guide`_ on how the GIL operates. He also covers the @@ -58,8 +58,8 @@ C Extensions The GIL ------- -`Special care`_ must be taken when writing C extensions to make sure you r -egister your threads with the interpreter. +`Special care`_ must be taken when writing C extensions to make sure you +register your threads with the interpreter. C Extensions :::::::::::: @@ -76,7 +76,9 @@ Pyrex Shedskin? --------- - +Numba +----- +.. todo:: Write about Numba and the autojit compiler for NumPy Threading ::::::::: @@ -86,7 +88,7 @@ Threading --------- -Spanwing Processes +Spawning Processes ------------------ diff --git a/docs/scenarios/web.rst b/docs/scenarios/web.rst index 264d450..9b7f627 100644 --- a/docs/scenarios/web.rst +++ b/docs/scenarios/web.rst @@ -98,12 +98,12 @@ framework like Django and the microframeworks: It comes with a lot of libraries and functionality and can thus not be considered lightweight. On the other hand, it does not provide all the functionality Django does. Instead Pyramid brings basic support for most regular tasks and provides a great deal of -extensibility. Additionally, Pyramid has a huge focus on complete +extensibility. Additionally, Pyramid has a huge focus on complete `documentation `_. As a little extra it comes with the Werkzeug Debugger which allows you to debug a running web application in the browser. -**Support** can also be found in the +**Support** can also be found in the `documentation `_. @@ -140,8 +140,8 @@ Gunicorn to serve Python applications. It is a Python interpretation of the Ruby `Unicorn `_ server. Unicorn is designed to be lightweight, easy to use, and uses many UNIX idioms. Gunicorn is not designed -to face the internet, in fact it was designed to run behind Nginx which buffers -slow requests, and takes care of other important considerations. A sample +to face the internet -- it was designed to run behind Nginx which buffers +slow requests and takes care of other important considerations. A sample setup for Nginx + gUnicorn can be found in the `Gunicorn help `_. @@ -189,7 +189,7 @@ support for Python 2.7 applications. Heroku allows you to run as many Python web applications as you like, 24/7 and free of charge. Heroku is best described as a horizontal scaling platform. They -start to charge you once you "scale" you application to run on more than one +start to charge you once you "scale" your application to run on more than one Dyno (abstracted servers) at a time. Heroku publishes `step-by-step instructions @@ -202,10 +202,9 @@ DotCloud ~~~~~~~~ `DotCloud `_ supports WSGI applications and -background/worker tasks natively on their platform. Web applications running -Python version 2.6, and uses :ref:`nginx ` and :ref:`uWSGI -`, and allows custom configuration of both -for advanced users. +background/worker tasks natively on their platform. Web applications run +Python version 2.6, use :ref:`nginx ` and :ref:`uWSGI +`, and allow custom configuration of both for advanced users. DotCloud uses a custom command-line API client which can work with applications managed in git repositories or any other version control @@ -222,7 +221,7 @@ getting started. Gondor ~~~~~~ -`Gondor `_ is a PaaS specailized for deploying Django +`Gondor `_ is a PaaS specialized for deploying Django and Pinax applications. Gondor supports Django versions 1.2 and 1.3 on Python version 2.7, and can automatically configure your Django site if you use ``local_settings.py`` for site-specific configuration information. @@ -238,7 +237,7 @@ Templating Most WSGI applications are responding to HTTP requests to serve content in HTML or other markup languages. Instead of generating directly textual content from Python, the concept of separation of concerns -advises us to use templates. A template engine manage a suite of +advises us to use templates. A template engine manages a suite of template files, with a system of hierarchy and inclusion to avoid unnecessary repetition, and is in charge of rendering (generating) the actual content, filling the static content @@ -265,7 +264,7 @@ and to the templates themselves. templates. This convenience can lead to uncontrolled increase in complexity, and often harder to find bugs. -- It is often possible or necessary to mix javascript templates with +- It is often necessary to mix javascript templates with HTML templates. A sane approach to this design is to isolate the parts where the HTML template passes some variable content to the javascript code.