# Example for surrogate model with a custom architecture using `sklearn` This example shows how to define a `sklearn` model architecture and use it as a surrogate. Please note that the model is not designed to be useful but to demonstrate the workflow. This example assumes some basic familiarity with using BayBE. We thus refer to [`campaign`](./../Basics/campaign.md) for a basic example. ## Necessary imports ```python import numpy as np import torch from sklearn.base import BaseEstimator, RegressorMixin from sklearn.ensemble import ( GradientBoostingRegressor, RandomForestRegressor, StackingRegressor, ) from sklearn.linear_model import LinearRegression, Ridge from torch import Tensor ``` ```python from baybe.campaign import Campaign from baybe.objectives import SingleTargetObjective from baybe.parameters import ( CategoricalParameter, NumericalDiscreteParameter, SubstanceParameter, ) from baybe.recommenders import ( BotorchRecommender, FPSRecommender, TwoPhaseMetaRecommender, ) from baybe.searchspace import SearchSpace from baybe.surrogates import register_custom_architecture from baybe.targets import NumericalTarget from baybe.utils.dataframe import add_fake_results ``` ## Surrogate Definition with BayBE Registration The final estimator class must follow the sklearn estimator interface. More details [here](https://scikit-learn.org/stable/developers/develop.html). The choice of using tensors in fit/predict is purely for BayBE, not a requirement. Final estimator ```python class MeanVarEstimator(BaseEstimator, RegressorMixin): """Stack final estimator for mean and variance.""" def fit(self, data: Tensor, targets: Tensor) -> None: """No fit needed.""" return def predict(self, data: Tensor) -> tuple[Tensor, Tensor]: """Predict based on ensemble unweighted mean and variance.""" mean = torch.tensor(data.mean(axis=1)) var = torch.tensor(data.var(axis=1)) return mean, var ``` Registration The class must include `_fit` and `_posterior` functions with the correct signatures. ```python @register_custom_architecture( joint_posterior_attr=False, constant_target_catching=False, batchify_posterior=True ) class StackingRegressorSurrogate: """Surrogate that extracts posterior from a stack of different regressors.""" def __init__(self): self.model: StackingRegressor | None = None def _posterior(self, candidates: Tensor) -> tuple[Tensor, Tensor]: """See :class:`baybe.surrogates.Surrogate`.""" return self.model.predict(candidates) def _fit(self, searchspace: SearchSpace, train_x: Tensor, train_y: Tensor) -> None: """See :class:`baybe.surrogates.Surrogate`.""" estimators = [ ("rf", RandomForestRegressor()), ("gb", GradientBoostingRegressor()), ("lr", LinearRegression()), ("rr", Ridge()), ] self.model = StackingRegressor( estimators=estimators, final_estimator=MeanVarEstimator(), cv=2, ) self.model.fit(train_x, train_y.ravel()) ``` ## Experiment Setup ```python parameters = [ CategoricalParameter( name="Granularity", values=["coarse", "medium", "fine"], encoding="OHE", ), NumericalDiscreteParameter( name="Pressure[bar]", values=[1, 5, 10], tolerance=0.2, ), NumericalDiscreteParameter( name="Temperature[degree_C]", values=np.linspace(100, 200, 10), ), SubstanceParameter( name="Solvent", data={ "Solvent A": "COC", "Solvent B": "CCC", "Solvent C": "O", "Solvent D": "CS(=O)C", }, encoding="MORDRED", ), ] ``` ## Run DOE iterations with custom surrogate Create campaign ```python campaign = Campaign( searchspace=SearchSpace.from_product(parameters=parameters, constraints=None), objective=SingleTargetObjective(target=NumericalTarget(name="Yield", mode="MAX")), recommender=TwoPhaseMetaRecommender( recommender=BotorchRecommender(surrogate_model=StackingRegressorSurrogate()), initial_recommender=FPSRecommender(), ), ) ``` ```python # Let's do a first round of recommendation recommendation = campaign.recommend(batch_size=2) ``` ```python print("Recommendation from campaign:") print(recommendation) ``` Recommendation from campaign: Granularity Pressure[bar] Temperature[degree_C] Solvent 2 coarse 1.0 100.0 Solvent C 239 medium 10.0 200.0 Solvent D Add some fake results ```python add_fake_results(recommendation, campaign.targets) campaign.add_measurements(recommendation) ``` ```python # Do another round of recommendations recommendation = campaign.recommend(batch_size=2) ``` Print second round of recommendations ```python print("Recommendation from campaign:") print(recommendation) ``` Recommendation from campaign: Granularity Pressure[bar] Temperature[degree_C] Solvent index 38 coarse 1.0 200.0 Solvent C 78 coarse 5.0 200.0 Solvent C ```python print() ``` ## Serialization Serialization of custom models is not supported ```python try: campaign.to_json() except RuntimeError as e: print(f"Serialization Error Message: {e}") ``` Serialization Error Message: Serializing objects of type 'CustomArchitectureSurrogate' is not supported.