Quickstart

Installation

pytask-parallel is available on PyPI and Anaconda.org. Install it with

$ pip install pytask-parallel

# or

$ conda install -c conda-forge pytask-parallel

Usage

When the plugin is only installed and pytask executed, the tasks are not run in parallel.

For parallelization with the default backend loky, you need to launch multiple workers or specify the parallel backend explicitly. Here, is how you launch multiple workers.

pytask -n 2
pytask --n-workers 2

# Starts os.cpu_count() - 1 workers.
pytask -n auto
[tool.pytask.ini_options]
n_workers = 2

# Starts os.cpu_count() - 1 workers.
n_workers = "auto"

To specify the parallel backend, pass the --parallel-backend option. The following command will execute the workflow with one worker and the loky backend.

pytask --parallel-backend loky
[tool.pytask.ini_options]
parallel_backend = "loky"

Backends

Important

It is not possible to combine parallelization with debugging. That is why --pdb or --trace deactivate parallelization.

If you parallelize the execution of your tasks, do not use breakpoint() or import pdb; pdb.set_trace() since both will cause exceptions.

loky

There are multiple backends available. The default is the backend provided by loky which is a more robust implementation of Pool and in ProcessPoolExecutor.

pytask --parallel-backend loky

A parallel backend with processes is especially suited for CPU-bound tasks as it spawns workers in new processes to run the tasks. (Here is an explanation of what CPU- or IO-bound means.)

coiled

pytask-parallel integrates with coiled allowing to run tasks in virtual machines of AWS, Azure and GCP. You can decide whether to run only some selected tasks or the whole project in the cloud.

Read more about coiled in this guide.

concurrent.futures

You can use the values threads and processes to use the ThreadPoolExecutor or the ProcessPoolExecutor respectively.

The ThreadPoolExecutor might be an interesting option for you if you have many IO-bound tasks and you do not need to create many expensive processes.

pytask --parallel-backend threads
[tool.pytask.ini_options]
parallel_backend = "threads"
pytask --parallel-backend processes
[tool.pytask.ini_options]
parallel_backend = "processes"

Important

Capturing warnings is not thread-safe. Therefore, warnings cannot be captured reliably when tasks are parallelized with --parallel-backend threads.

dask

dask allows to run your workflows on many different kinds of clusters like cloud clusters and traditional HPC.

Using the default mode, dask will spawn multiple local workers to process the tasks.

pytask --parallel-backend dask
[tool.pytask.ini_options]
parallel_backend = "dask"

Custom executors

You can also use any custom executor that implements the Executor interface. Read more about it in Custom Executors.

Important

Please, consider contributing your executor to pytask-parallel if you believe it could be helpful to other people. Start by creating an issue or a draft PR.