Erik Rigtorp

Parallelize scikit-optimize using Dask

In this note I show how you can parallelize scikit-optimize (skopt) using Dask.

First initialize a Optimizer instance:

from skopt import Optimizer
from skopt.space import Real
from skopt.benchmarks import branin
from dask.distributed import Client, as_completed

optimizer = Optimizer(
    dimensions=[Real(-5.0, 10.0), Real(0.0, 15.0)],
    random_state=1,
    base_estimator='gp'
)

Next connect to a Dask cluster and using the Dask futures API launch multiple parallel instances of the objective function (branin example function in this case):

# connect to dask cluster
client = Client(processes=False)

# keep `optimizer.n_initial_points_` jobs active during the optimization
futures = []
for x in optimizer.ask(optimizer.n_initial_points_):
    futures.append(client.submit(lambda x: (x, branin(x)), x))

Finally keep running with optimizer.n_initial_points_ jobs active until the exit condition has been satisfied:

seq = as_completed(futures) # iterate over futures in completion order
for future in seq:
    x, y = future.result()
    optimizer.tell(x, y)
    if len(optimizer.Xi) > 20: # exit condition
        continue
    next_x = optimizer.ask()
    seq.add(client.submit(lambda x: (x, branin(x)), next_x))

print(min(optimizer.yi)) # print the best objective found

You might want to tune the above code to launch one job for each core in your cluster.

Download full sample code skopt-dask.py