Parallelize scikit-optimize using Dask
In this note I show how you can parallelize scikit-optimize (skopt) using Dask.
First initialize a Optimizer
instance:
from skopt import Optimizer
from skopt.space import Real
from skopt.benchmarks import branin
from dask.distributed import Client, as_completed
optimizer = Optimizer(
dimensions=[Real(-5.0, 10.0), Real(0.0, 15.0)],
random_state=1,
base_estimator='gp'
)
Next connect to a Dask cluster and using the Dask futures API launch multiple
parallel instances of the objective function (branin
example function in this case):
# connect to dask cluster
client = Client(processes=False)
# keep `optimizer.n_initial_points_` jobs active during the optimization
futures = []
for x in optimizer.ask(optimizer.n_initial_points_):
futures.append(client.submit(lambda x: (x, branin(x)), x))
Finally keep running with optimizer.n_initial_points_
jobs active until the
exit condition has been satisfied:
seq = as_completed(futures) # iterate over futures in completion order
for future in seq:
x, y = future.result()
optimizer.tell(x, y)
if len(optimizer.Xi) > 20: # exit condition
continue
next_x = optimizer.ask()
seq.add(client.submit(lambda x: (x, branin(x)), next_x))
print(min(optimizer.yi)) # print the best objective found
You might want to tune the above code to launch one job for each core in your cluster.