lunes, 12 de abril de 2010

Project description

Currently IPython provides a command-line client that executes all code in a single process, and a set of tools for distributed and parallel computing that execute code in multiple processes (possibly but not necessarily on different hosts), using the Twisted asynchronous framework for communication between nodes. For a number of reasons, it is desirable to unify the architecture of the local execution with that of distributed computing, since ultimately many of the underlying abstractions are similar and should be reused. In particular, we would like to:

- Have even for a single user a 2-process model, so that the environment where code is being input runs in a different process from that which executes the code. This would prevent a crash of the Python interpreter executing code (because of a segmentation fault in a compiled extension or an improper access to a C library via ctypes, for example) from destroying the user session.

- Have the same kernel used for executing code locally be available over the network for distributed computing. Currently the Twisted-using IPython engines for distributed computing do not share any code with the command-line client, which means that many of the additional features of IPython (tab completion, object introspection, magic functions, etc) are not available while using the distributed computing system. Once the regular command-line environment is ported to allowing such a 2-process model, this newly decoupled kernel could form the core of a distributed computing IPython engine and all capabilities would be available throughout the system.

- Have a route to Python3 support. Twisted is a large and complex library that does currently not support Python3, and as indicated by the Twisted developers it may take a while before it is ported (http://stackoverflow.com/questions/172306/how-are-you-planning-on-handling-the-migration-to-python-3). For IPython, this means that while we could port the command-line environment, a large swath of IPython would be left 2.x-only, a highly undesirable situation. For this reason, the search for an alternative to Twisted has been active for a while, and recently we've identified the ZeroMQ (http://www.zeromq.org, zmq for short) library as a viable candidate. Zmq is a fast, simple messaging library written in C++, for which one of the IPython developers has written Python bindings using Cython (http://www.zeromq.org/bindings:python). Since Cython already knows how to generate Python3-compliant bindings with a simple command-line switch, zmq can be used with Python3 when needed.

As part of the Zmq Python bindings, the IPython developers have already developed a simple prototype of such a two-process kernel/frontend system (details below). I propose to start from this example and port today's IPython code to operate in a similar manner. IPython's command-line program (the main 'ipython' script) executes both user interaction and the user's code in the same process. This project will thus require breaking up IPython into the parts that correspond to the kernel and the parts that are meant to interact with the user, and making these two components communicate over the network using zmq instead of accessing local attributes and methods of a single global object.

Once this port is complete, the resulting tools will be the foundation (though as part of this proposal I do not expect to undertake either of these tasks) to allow the distributed computing parts of IPython to use the same code as the command-line client, and for the whole system to be ported to Python3. So while I do not intend to tackle here the removal of Twisted and the unification of the local and distributed parts of IPython, my proposal is a necessary step before those are possible.

No hay comentarios:

Publicar un comentario