Introduction

The major objective of de novo drug design is to find new active molecules that may concurrently satisfy a number of crucial optimization goals, including activity, selectivity, physico-chemical, and ADMET characteristics. The three basic components of most de novo drug design tools are the search space (SS), search algorithm, and search objective. In this case, the generative models can be considered the search space. In addition, we see distribution learning and goal-directed generation as the two primary trends in generative models used for de novo design.

The main goal of distribution learning is to produce ideas that mimic a specific collection of molecules. In order to suggest molecules that meet the specified objective (or objectives), goal-directed generation techniques often employ search algorithms rather than sampling the whole search space. In both situations, the final results are filtered by a user-defined scoring function (search objective), either during the generation process in the goal-driven instance or after a sampling of all possible solutions in the distribution-learning scenario.

General Use Cases

Regarding the requirements of the particular project, we can divide idea generation behavior into two broad categories: exploration and exploitation. While in the exploration mode, we are looking for a broader range of ideas, in the exploitation mode we are searching for answers that are very near to one another.

Distribution Learning

Workflow "B" in Table 1 illustrates distribution learning by introducing a small dataset to the model before sampling a large dataset from it. This necessitates screening through enormous volumes of data, which might include hundreds of thousands or even millions of molecules. Although the workflow of performing transfer learning followed by some sort of scoring is entirely valid and appears to be generally accepted, we discover that it is less effective and significantly more computationally costly than the goal-directed generation.

Goal-Directed Learning

Workflows "C," "D," "E," and "F" from Table 1 all use Reinforcement Learning (RL) to carry out the goal-directed generation. The model is guided by RL in a series of steps toward a region of the chemical space that provides adequate reward. At each step of the procedure, compounds are produced and assessed.

In Diversity Filters, users can specify their preferred score threshold (DF). The DF memory contains all created compounds that score higher than the threshold. If the gathered compounds are produced more than once, the score is deducted, which discourages the generative model and encourages it to produce new compounds.

It is crucial to emphasize that compound production occurs during the RL process, not after. Data is gathered throughout reinforcement learning rather than at its end. This is a clear distinction between Transfer Learning (TL) and RL because we do not use the model's final state.

Table 1: General use cases supported by REINVENT

Label	Workflow	Use Case	Description	Superbio App Links (respectively from above to below)
A	Sampling followed by scoring	Exploration	It has been demonstrated that numerous unique chemicals can be produced by directly sampling from related generative models. A scenario where a static database of compounds is screened is similar to the evaluation of these compounds with a scoring system. This strategy does not appear to be particularly effective because it may potentially involve sampling millions of molecules without producing any useful results. This loses appeal if the scoring function includes very pricey elements like docking, pharmacophore similarity, or slow predictive models with high computational descriptor costs.	Reinvent: Sampling Module
Reinvent: Scoring Module
B	Transfer Learning with a small set of compounds, followed by sampling and then scoring	Exploration	With a reduced collection of molecules relevant to the project of interest, the generative model is put through transfer learning. As a result, the model will be biased to produce project-specific molecules considerably more frequently than random compounds. Therefore, a much smaller dataset can be sufficient when employing the scoring algorithm to locate good hits. However, there is no assurance that the score will be good, and this may call for further sample rounds and perhaps even more transfer learning procedures.	Reinvent: Create Initial Prior/Agent Generative Model
Reinvent: Data Preparation Module
Reinvent: Train Initial Generative Model for Prior/Agent
Reinvent: Sampling Module
Reinvent: Scoring Module
C	Reinforcement Learning with Diversity Filters set to “NoFilter” or other filters with extended bucket size	Exploitation	Goal-directed generation can be accomplished using this technique. The creation of innovative ideas occurs throughout the entire run. Typically, the final model is discarded after the run. Due to the lack of mandated scaffold dissimilarity, "NoFilter" mode enables the collection of solutions that may be highly similar. Extending the bucket size while employing different DFs further encourages exploitation because it enables the sampling of a greater variety of chemicals that are comparable before applying penalties.	Reinvent: Create Initial Prior/Agent Generative Model
Reinvent: Data Preparation Module
Reinvent: Train Initial Generative Model for Prior/Agent
Reinvent: Train QSAR Models for Target Protein
Reinvent: Reinforcement Learning Module
D	Reinforcement Learning with Diversity Filters set to different than “NoFilter” type	Exploration	Additionally, in this mode, goal-directed generation is possible. To force the creation of molecules with various scaffolds, diversity filters must be used with any form of scaffold filter. The diversity filters will memorize the compounds that have already been discovered and will penalize similar or identical ideas while keeping the scoring function specification the same. In order to adapt, the agent will create new scaffolds while attempting to maximize the score function.Diversity filters can also be thought of as an adaptive penalty that modifies the score. Since it reduces the evaluation of random compounds to a minimum, this approach of investigation is by far the most effective.	Reinvent: Create Initial Prior/Agent Generative Model
Reinvent: Data Preparation Module
Reinvent: Train Initial Generative Model for Prior/Agent
Reinvent: Train QSAR Models for Target Protein
Reinvent: Reinforcement Learning Module
E	Transfer Learning with Subsequent Reinforcement Learning (currently not applicable)	Exploitation
and
Exploration	By achieving the productivity of the generative model sooner, this approach can be used to accelerate the learning process. To do this, transfer learning is used using a group of compounds that have the required qualities. For instance, we would use every substance that this model deems active if our goal was to maximize a predictive model relative to the other components. If we target particular subseries of molecules, we would only employ those for transfer learning if they shared particular properties. The final agent is "focused" on the particular set after transfer learning has been completed. In order to determine which agent is sufficiently concentrated, users can run TL for numerous epochs and examine the statistics from each epoch. The user can visually check to see if the necessary structural patterns are present by consulting the tensorboard log, which offers a variety of information, including the most commonly sampled compounds. After each epoch, the trained agent's state is saved, and the user can select which agent to employ for the upcoming RL. Keep in mind that everything is being done to prepare the agent for the RL so that it has an advantage during the exploitation search. By selecting the right diversity filter types, the user can still utilize this as a starting point for exploration even though it is more likely to be used for exploitation. The generative model should continue to be "unfocused”.
F	Reinforcement Learning with Inception	Exploitation
and
Exploration	This is comparable to case "E." Users can put a list of compounds with desirable qualities into the inception memory rather than "focusing" the agent. During the RL, the agent will be directly exposed to those instances, and the scoring mechanism will reward them, encouraging the agent to produce more molecules that are comparable. Users should devise compounds that, while receiving a high enough score from the scoring function, fall below the "minscore" threshold of the selected DF.	Reinvent: Create Initial Prior/Agent Generative Model
Reinvent: Data Preparation Module
Reinvent: Train Initial Generative Model for Prior/Agent
Reinvent: Train QSAR Models for Target Protein
Reinvent: Reinforcement Learning Module