# Example for full simulation loop using a table-based lookup mechanism with incomplete
data

This example shows a simulation for a direct arylation where not all combinations were
measured.
This allows us to access information about previously conducted experiments from .xlsx-
files.

This examples assumes some basic familiarity with using BayBE and the lookup mechanism.
We refer to [`campaign`](./../Basics/campaign.md) for a more  basic example resp.
to [`full_lookup`](./full_lookup.md) for details on the lookup mechanism.

## Necessary imports for this example


```python
import os
```


```python
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
```


```python
from baybe import Campaign
from baybe.objectives import SingleTargetObjective
from baybe.parameters import NumericalDiscreteParameter, SubstanceParameter
from baybe.recommenders import RandomRecommender, TwoPhaseMetaRecommender
from baybe.searchspace import SearchSpace
from baybe.simulation import simulate_scenarios
from baybe.targets import NumericalTarget
```

## Parameters for a full simulation loop

For the full simulation, we need to define some additional parameters.
These are the number of Monte Carlo runs and the number of experiments to be conducted
per run.


```python
SMOKE_TEST = "SMOKE_TEST" in os.environ
```


```python
N_MC_ITERATIONS = 2 if SMOKE_TEST else 5
N_DOE_ITERATIONS = 2 if SMOKE_TEST else 5
BATCH_SIZE = 1 if SMOKE_TEST else 3
```

## Lookup functionality and data creation

See [`full_lookup`](./full_lookup.md) for details.


```python
try:
    lookup = pd.read_excel("./lookup_withmissing.xlsx")
except FileNotFoundError:
    try:
        lookup = pd.read_excel("examples/Backtesting/lookup_withmissing.xlsx")
    except FileNotFoundError as e:
        print(e)
```

As usual, we set up some experiment.
Note that we now need to ensure that the names fit the names in the provided .xlsx file!


```python
dict_solvent = {
    "DMAc": r"CC(N(C)C)=O",
    "Butyornitrile": r"CCCC#N",
    "Butyl Ester": r"CCCCOC(C)=O",
    "p-Xylene": r"CC1=CC=C(C)C=C1",
}
dict_base = {
    "Potassium acetate": r"O=C([O-])C.[K+]",
    "Potassium pivalate": r"O=C([O-])C(C)(C)C.[K+]",
    "Cesium acetate": r"O=C([O-])C.[Cs+]",
    "Cesium pivalate": r"O=C([O-])C(C)(C)C.[Cs+]",
}
dict_ligand = {
    "BrettPhos": r"CC(C)C1=CC(C(C)C)=C(C(C(C)C)=C1)C2=C(P(C3CCCCC3)C4CCCCC4)C(OC)="
    "CC=C2OC",
    "Di-tert-butylphenylphosphine": r"CC(C)(C)P(C1=CC=CC=C1)C(C)(C)C",
    "(t-Bu)PhCPhos": r"CN(C)C1=CC=CC(N(C)C)=C1C2=CC=CC=C2P(C(C)(C)C)C3=CC=CC=C3",
    "Tricyclohexylphosphine": r"P(C1CCCCC1)(C2CCCCC2)C3CCCCC3",
    "PPh3": r"P(C1=CC=CC=C1)(C2=CC=CC=C2)C3=CC=CC=C3",
    "XPhos": r"CC(C1=C(C2=CC=CC=C2P(C3CCCCC3)C4CCCCC4)C(C(C)C)=CC(C(C)C)=C1)C",
    "P(2-furyl)3": r"P(C1=CC=CO1)(C2=CC=CO2)C3=CC=CO3",
    "Methyldiphenylphosphine": r"CP(C1=CC=CC=C1)C2=CC=CC=C2",
    "1268824-69-6": r"CC(OC1=C(P(C2CCCCC2)C3CCCCC3)C(OC(C)C)=CC=C1)C",
    "JackiePhos": r"FC(F)(F)C1=CC(P(C2=C(C3=C(C(C)C)C=C(C(C)C)C=C3C(C)C)C(OC)=CC=C2OC)"
    r"C4=CC(C(F)(F)F)=CC(C(F)(F)F)=C4)=CC(C(F)(F)F)=C1",
    "SCHEMBL15068049": r"C[C@]1(O2)O[C@](C[C@]2(C)P3C4=CC=CC=C4)(C)O[C@]3(C)C1",
    "Me2PPh": r"CP(C)C1=CC=CC=C1",
}
```

## Creating the searchspace and the objective

Here, we create the parameter objects, the searchspace and the objective.


```python
solvent = SubstanceParameter(name="Solvent", data=dict_solvent, encoding="MORDRED")
base = SubstanceParameter(name="Base", data=dict_base, encoding="MORDRED")
ligand = SubstanceParameter(name="Ligand", data=dict_ligand, encoding="MORDRED")
temperature = NumericalDiscreteParameter(
    name="Temp_C", values=[90, 105, 120], tolerance=2
)
concentration = NumericalDiscreteParameter(
    name="Concentration", values=[0.057, 0.1, 0.153], tolerance=0.005
)
```


```python
parameters = [solvent, base, ligand, temperature, concentration]
```


```python
searchspace = SearchSpace.from_product(parameters=parameters)
objective = SingleTargetObjective(target=NumericalTarget(name="yield", mode="MAX"))
```

## Constructing campaigns for the simulation loop

In this example, we create two campaigns.
One uses the default recommender and the other one makes random recommendations.


```python
campaign = Campaign(searchspace=searchspace, objective=objective)
campaign_rand = Campaign(
    searchspace=searchspace,
    recommender=TwoPhaseMetaRecommender(recommender=RandomRecommender()),
    objective=objective,
)
```

We can now use the `simulate_scenarios` function to simulate a full experiment.
Note that this function enables to run multiple scenarios by a single function call.
For this, it is necessary to define a dictionary mapping scenario names to campaigns.


```python
scenarios = {"Test_Scenario": campaign, "Random": campaign_rand}
```

The lookup table does not contain data for all possible combination of parameters.
Consequently, we need to inform the function how to deal with missing entries.
This is done via the `impute_mode` keyword.
The following options are available:
  * `"error"`: an error will be thrown
  * `"worst"`: imputation using the worst available value for each target
  * `"best"`: imputation using the best available value for each target
  * `"mean"`: imputation using mean value for each target
  * `"random"`: a random row will be used as lookup
  * `"ignore"`: the search space is stripped before recommendations are made
      so that unmeasured experiments will not be recommended


```python
results = simulate_scenarios(
    scenarios,
    lookup,
    batch_size=BATCH_SIZE,
    n_doe_iterations=N_DOE_ITERATIONS,
    n_mc_iterations=N_MC_ITERATIONS,
    impute_mode="best",
)
```

    
The following lines plot the results and save the plot in run_impute_mode.png


```python
max_yield = lookup["yield"].max()
sns.lineplot(
    data=results, x="Num_Experiments", y="yield_CumBest", hue="Scenario", marker="x"
)
plt.plot([3, 3 * N_DOE_ITERATIONS], [max_yield, max_yield], "--r")
plt.legend(loc="lower right")
plt.gcf().set_size_inches(20, 8)
plt.savefig("./run_impute_mode.png")
```