Package mdp :: Package parallel :: Class ProcessScheduler
[hide private]
[frames] | no frames]

Class ProcessScheduler


Scheduler that distributes the task to multiple processes.

The subprocess module is used to start the requested number of processes.
The execution of each task is internally managed by dedicated thread.

This scheduler should work on all platforms (at least on Linux,
Windows XP and Vista). 

Instance Methods [hide private]
 
__init__(self, result_container=None, verbose=False, n_processes=1, source_paths=None, python_executable=None, cache_callable=True)
Initialize the scheduler and start the slave processes.
 
_process_task(self, data, task_callable, task_index)
Add a task, if possible without blocking.
 
_shutdown(self)
Shut down the slave processes.
 
_task_thread(self, process, data, task_callable, task_index)
Thread function which cares for a single task.

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

    Inherited from Scheduler
 
_store_result(self, result, task_index)
Store a result in the internal result container.
 
add_task(self, data, task_callable=None)
Add a task to be executed.
 
get_results(self)
Get the accumulated results from the result container.
 
set_task_callable(self, task_callable)
Set the callable that will be used if no task_callable is given.
 
shutdown(self)
Controlled shutdown of the scheduler.
Properties [hide private]

Inherited from object: __class__

    Inherited from Scheduler
  n_open_tasks
This property counts of submitted but unfinished tasks.
  task_counter
This property counts the number of submitted tasks.
Method Details [hide private]

__init__(self, result_container=None, verbose=False, n_processes=1, source_paths=None, python_executable=None, cache_callable=True)
(Constructor)

 
Initialize the scheduler and start the slave processes.

result_container -- ResultContainer used to store the results.
verbose -- Set to True to get progress reports from the scheduler
    (default value is False).
n_processes -- Number of processes used in parallel. This should
    correspond to the number of processors / cores.
source_paths -- List of paths to the source code of the project using 
    the scheduler. These paths will be appended to sys.path in the
    processes to make the task unpickling work. 
    A single path instead of a list is also accepted.
    Set to None if no sources are needed for unpickling the task (this 
    is the default value).
python_executable -- Python executable that is used for the processes.
    The default value is None, in which case sys.executable will be
    used.
cache_callable -- Cache the task objects in the processes (default
    is True). Disabling caching can reduce the memory usage, but will
    generally be less efficient since the task_callable has to be
    pickled each time.

Overrides: object.__init__

_process_task(self, data, task_callable, task_index)

 
Add a task, if possible without blocking.

It blocks when the system is not able to start a new thread
or when the processes are all in use.

Overrides: Scheduler._process_task

_shutdown(self)

 
Shut down the slave processes.

If a process is still running a task then an exception is raised.

Overrides: Scheduler._shutdown

_task_thread(self, process, data, task_callable, task_index)

 
Thread function which cares for a single task.

The task is pushed to the process via stdin, then we wait for the
result on stdout, pass the result to the result container, free
the process and exit.