# Example for full simulation loop using a table-based lookup mechanism with initial
data

This example shows a simulation for a direct arylation where all combinations have been
measured.
It also demonstrates how to use initial data by using a lookup mechanism.
This allows us to access information about previously conducted experiments from .xlsx-
files.

This examples assumes some basic familiarity with using BayBE and the lookup mechanism.
We thus refer to [`campaign`](./../Basics/campaign.md) for a basic example.
We refer to [`full_lookup`](./full_lookup.md) for details on the lookup mechanism.

## Necessary imports for this example


```python
import os
```


```python
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
```


```python
from baybe import Campaign
from baybe.objectives import SingleTargetObjective
from baybe.parameters import NumericalDiscreteParameter, SubstanceParameter
from baybe.recommenders import RandomRecommender, TwoPhaseMetaRecommender
from baybe.searchspace import SearchSpace
from baybe.simulation import simulate_scenarios
from baybe.targets import NumericalTarget
```

## Parameters for a full simulation loop

For the full simulation, we need to define an additional parameter.
Since this example uses initial data, we only need to define the number of iterations
per run.
The number of runs is determined by the number of initial data points provided.


```python
SMOKE_TEST = "SMOKE_TEST" in os.environ
```


```python
N_DOE_ITERATIONS = 2 if SMOKE_TEST else 5
BATCH_SIZE = 1 if SMOKE_TEST else 3
```

## Lookup functionality and data creation

See [`full_lookup`](./full_lookup.md) for details.


```python
try:
    lookup = pd.read_excel("./lookup.xlsx")
except FileNotFoundError:
    try:
        lookup = pd.read_excel("examples/Backtesting/lookup.xlsx")
    except FileNotFoundError as e:
        print(e)
```

## Inclusion of initial data

To include initial data, we sample some rows from the lookup table.
Note that the initial_data needs to be a list of `pd.DataFrame` objects.
One experiment will be performed per provided initial data set.


```python
initial_data = [lookup.sample(n=5), lookup.sample(n=5), lookup.sample(n=5)]
```

As usual, we set up some experiment.
Note that we now need to ensure that the names fit the names in the provided .xlsx file!


```python
dict_solvent = {
    "DMAc": r"CC(N(C)C)=O",
    "Butyornitrile": r"CCCC#N",
    "Butyl Ester": r"CCCCOC(C)=O",
    "p-Xylene": r"CC1=CC=C(C)C=C1",
}
dict_base = {
    "Potassium acetate": r"O=C([O-])C.[K+]",
    "Potassium pivalate": r"O=C([O-])C(C)(C)C.[K+]",
    "Cesium acetate": r"O=C([O-])C.[Cs+]",
    "Cesium pivalate": r"O=C([O-])C(C)(C)C.[Cs+]",
}
dict_ligand = {
    "BrettPhos": r"CC(C)C1=CC(C(C)C)=C(C(C(C)C)=C1)C2=C(P(C3CCCCC3)C4CCCCC4)C(OC)="
    "CC=C2OC",
    "Di-tert-butylphenylphosphine": r"CC(C)(C)P(C1=CC=CC=C1)C(C)(C)C",
    "(t-Bu)PhCPhos": r"CN(C)C1=CC=CC(N(C)C)=C1C2=CC=CC=C2P(C(C)(C)C)C3=CC=CC=C3",
    "Tricyclohexylphosphine": r"P(C1CCCCC1)(C2CCCCC2)C3CCCCC3",
    "PPh3": r"P(C1=CC=CC=C1)(C2=CC=CC=C2)C3=CC=CC=C3",
    "XPhos": r"CC(C1=C(C2=CC=CC=C2P(C3CCCCC3)C4CCCCC4)C(C(C)C)=CC(C(C)C)=C1)C",
    "P(2-furyl)3": r"P(C1=CC=CO1)(C2=CC=CO2)C3=CC=CO3",
    "Methyldiphenylphosphine": r"CP(C1=CC=CC=C1)C2=CC=CC=C2",
    "1268824-69-6": r"CC(OC1=C(P(C2CCCCC2)C3CCCCC3)C(OC(C)C)=CC=C1)C",
    "JackiePhos": r"FC(F)(F)C1=CC(P(C2=C(C3=C(C(C)C)C=C(C(C)C)C=C3C(C)C)C(OC)=CC=C2OC)"
    r"C4=CC(C(F)(F)F)=CC(C(F)(F)F)=C4)=CC(C(F)(F)F)=C1",
    "SCHEMBL15068049": r"C[C@]1(O2)O[C@](C[C@]2(C)P3C4=CC=CC=C4)(C)O[C@]3(C)C1",
    "Me2PPh": r"CP(C)C1=CC=CC=C1",
}
```

## Creating the searchspace and the objective

Here, we create the parameter objects, the searchspace and the objective.


```python
base = SubstanceParameter(name="Base", data=dict_base, encoding="MORDRED")
solvent = SubstanceParameter(name="Solvent", data=dict_solvent, encoding="MORDRED")
ligand = SubstanceParameter(name="Ligand", data=dict_ligand, encoding="MORDRED")
temperature = NumericalDiscreteParameter(
    name="Temp_C", values=[90, 105, 120], tolerance=2
)
concentration = NumericalDiscreteParameter(
    name="Concentration", values=[0.057, 0.1, 0.153], tolerance=0.005
)
```


```python
parameters = [solvent, base, ligand, temperature, concentration]
```


```python
searchspace = SearchSpace.from_product(parameters=parameters)
objective = SingleTargetObjective(target=NumericalTarget(name="yield", mode="MAX"))
```

## Constructing campaigns for the simulation loop

In this example, we create two campaigns.
One uses the default recommender and the other one makes random recommendations.


```python
campaign = Campaign(searchspace=searchspace, objective=objective)
campaign_rand = Campaign(
    searchspace=searchspace,
    recommender=TwoPhaseMetaRecommender(recommender=RandomRecommender()),
    objective=objective,
)
```

## Performing the simulation loop

We can now use the `simulate_scenarios` function to simulate a full experiment.
This function is where we provide the `initial_data` dataframe.
Note that this function enables to run multiple scenarios by a single function call.
For this, it is necessary to define a dictionary mapping scenario names to campaigns.


```python
scenarios = {"Test_Scenario": campaign, "Random": campaign_rand}
```


```python
results = simulate_scenarios(
    scenarios,
    lookup,
    batch_size=BATCH_SIZE,
    n_doe_iterations=N_DOE_ITERATIONS,
    initial_data=initial_data,
)
```

    
The following lines plot the results and save the plot in run_full_initial_data.png


```python
max_yield = lookup["yield"].max()
sns.lineplot(
    data=results, x="Num_Experiments", y="yield_CumBest", hue="Scenario", marker="x"
)
plt.plot([3, 3 * N_DOE_ITERATIONS], [max_yield, max_yield], "--r")
plt.legend(loc="lower right")
plt.gcf().set_size_inches(20, 8)
plt.savefig("./run_full_initial_data.png")
```