IMP Reference Guide
develop.d97d4ead1f,2024/11/21
The Integrative Modeling Platform
|
Distribute IMP tasks to multiple processors or machines. More...
Distribute IMP tasks to multiple processors or machines.
This module employs a manager-worker model; the main (manager) IMP process sends the tasks out to one or more workers. Tasks cannot communicate with each other, but return results to the manager. The manager can then start new tasks, possibly using results returned from completed tasks. The system is fault tolerant; if a worker fails, any tasks running on that worker are automatically moved to another worker.
To use the module, first create a Manager object. Add one or more workers to the Manager using its add_worker() method (example workers are LocalWorker, which simply starts another IMP process on the same machine as the manager, and SGEQsubWorkerArray, which starts an array of multiple workers on a Sun GridEngine cluster). Next, call the get_context() method, which creates and returns a new Context object. Add tasks to the Context with the Context.add_task() method (each task is simply a Python function or other callable object). Finally, call Context.get_results_unordered() to send the tasks out to the workers (a worker only runs a single task at a time; if there are more tasks than workers later tasks will be queued until a worker is done with an earlier task). This method returns the results from each task as it completes.
Setup in IMP is often expensive, and thus the Manager.get_context() method allows you to specify a Python function or other callable object to do any setup for the tasks. This function will be run on the worker before any tasks from that context are started (the return values from this function are passed to the task functions). If multiple tasks from the same context are run on the same worker, the setup function is only called once.
Troubleshooting
Several common problems with this module are described below, together with solutions.
/bin/sh: qsub: command not found
, but qsub
works fine from a terminal.qsub
command to submit the SGE job that starts the workers. Thus, qsub
must be in your system PATH. This may not be the case if you are using a shell script such as imppy.sh
to start IMP. To fix this, modify the shell script to add the directory containing qsub
to the PATH, or remove the setting of PATH entirely.ImportError: No module named IMP.parallel.worker_handler
.imppy.sh
, you need to tell the workers to do that too. Specify the full command line needed to start a suitable Python interpreter as the 'python' argument when you create the Manager object.socket.error: (110, 'Connection timed out')
.socket.error: (111, 'Connection refused')
.Author(s): Ben Webb
Maintainer: benmwebb
License: LGPL This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Publications:
Namespaces | |
manager_communicator | |
Classes for communicating from the manager to workers. | |
subproc | |
Subprocess handling. | |
util | |
Utilities for the IMP.parallel module. | |
Classes | |
class | Context |
A collection of tasks that run in the same environment. More... | |
class | Error |
Base class for all errors specific to the parallel module. More... | |
class | LocalWorker |
A worker running on the same machine as the manager. More... | |
class | Manager |
Manages workers and contexts. More... | |
class | NetworkError |
Error raised if a problem occurs with the network. More... | |
class | NoMoreWorkersError |
Error raised if all workers failed, so tasks cannot be run. More... | |
class | RemoteError |
Error raised if a worker has an unhandled exception. More... | |
class | SGEPEWorkerArray |
An array of workers in a Sun Grid Engine system parallel environment. More... | |
class | SGEQsubWorkerArray |
An array of workers on a Sun Grid Engine system, started with 'qsub'. More... | |
class | Worker |
Representation of a single worker. More... | |
class | WorkerArray |
Representation of an array of workers. More... | |
Functions | |
def | get_data_path |
Return the full path to one of this module's data files. More... | |
def | get_example_path |
Return the full path to one of this module's example files. More... | |
def | get_module_name |
Return the fully-qualified name of this module. More... | |
def | get_module_version |
Return the version of this module, as a string. More... | |
def IMP.parallel.get_data_path | ( | fname | ) |
Return the full path to one of this module's data files.
Definition at line 592 of file parallel/__init__.py.
def IMP.parallel.get_example_path | ( | fname | ) |
Return the full path to one of this module's example files.
Definition at line 597 of file parallel/__init__.py.
def IMP.parallel.get_module_name | ( | ) |
Return the fully-qualified name of this module.
Definition at line 587 of file parallel/__init__.py.
def IMP.parallel.get_module_version | ( | ) |
Return the version of this module, as a string.
Definition at line 582 of file parallel/__init__.py.