Quickstart

Usage¶

When the plugin is only installed and pytask executed, the tasks are not run in parallel.

For parallelization with the default backend loky, you need to launch multiple workers or specify the parallel backend explicitly. Here, is how you launch multiple workers.

CLI

pytask -n 2
pytask --n-workers 2

# Starts os.cpu_count() - 1 workers.
pytask -n auto

Configuration

[tool.pytask.ini_options]
n_workers = 2

# Starts os.cpu_count() - 1 workers.
n_workers = "auto"

To specify the parallel backend, pass the --parallel-backend option. The following command will execute the workflow with one worker and the loky backend.

CLI

pytask --parallel-backend loky

Configuration

[tool.pytask.ini_options]
parallel_backend = "loky"

Backends¶

Important

It is not possible to combine parallelization with debugging. That is why --pdb or --trace deactivate parallelization.

If you parallelize the execution of your tasks, do not use breakpoint() or import pdb; pdb.set_trace() since both will cause exceptions.

loky¶

There are multiple backends available. The default is the backend provided by loky which is a more robust implementation of Pool and in ProcessPoolExecutor.

pytask --parallel-backend loky

A parallel backend with processes is especially suited for CPU-bound tasks as it spawns workers in new processes to run the tasks. (Here is an explanation of what CPU- or IO-bound means.)

coiled¶

pytask-parallel integrates with coiled allowing to run tasks in virtual machines of AWS, Azure and GCP. You can decide whether to run only some selected tasks or the whole project in the cloud.

`concurrent.futures`¶

You can use the values threads and processes to use the ThreadPoolExecutor or the ProcessPoolExecutor respectively.

The ThreadPoolExecutor might be an interesting option for you if you have many IO-bound tasks and you do not need to create many expensive processes.

CLI

pytask --parallel-backend threads

Configuration

[tool.pytask.ini_options]
parallel_backend = "threads"

CLI

pytask --parallel-backend processes

Configuration

[tool.pytask.ini_options]
parallel_backend = "processes"

Important

Capturing warnings is not thread-safe. Therefore, warnings cannot be captured reliably when tasks are parallelized with --parallel-backend threads.

dask¶

dask allows to run your workflows on many different kinds of clusters like cloud clusters and traditional HPC.

Using the default mode, dask will spawn multiple local workers to process the tasks.

CLI

pytask --parallel-backend dask

Configuration

[tool.pytask.ini_options]
parallel_backend = "dask"

Custom executors¶

You can also use any custom executor that implements the Executor interface. Read more about it in Custom Executors.

Important

Please, consider contributing your executor to pytask-parallel if you believe it could be helpful to other people. Start by creating an issue or a draft PR.

Quickstart¶

Installation¶

Usage¶

Backends¶

loky¶

coiled¶

`concurrent.futures`¶

dask¶

Custom executors¶

Quickstart¶

Installation¶

Usage¶

Backends¶

loky¶

coiled¶

concurrent.futures¶

dask¶

Custom executors¶

`concurrent.futures`¶