From python API

Download examples

From command line: git clone https://github.com/NostrumBioDiscovery/analogs_finder.git

Load your query molecule and your database

from rdkit import Chem

database = "analogs_finder/examples/database.sdf"
qmolecule = "analogs_finder/examples/substructre_1.sdf"

molecules_db= Chem.SDMolSupplier(database)
molecule_query = next(Chem.SDMolSupplier(qmolecule))

Analyze your dataset

The command below will output the tanimoto similarity distribution among all dataset and all fingerprints, at the same time will show a plot of the two first components of the PCA over the fingerprint space coloured by similarity to your query molecule. If we hover the points of the plot we can inspect the different structures of the molecules.

from analogs_finder.analysis import analysis_dataset as an


#Use Uniform manifold to plot the chemical space
an.main(molecule_query, molecules_db, dim_type="umap")

#Use PCA to plot the chemical space
an.main(molecule_query, molecules_db, dim_type="pca")

We find the similarity_hist_DL.png:

../../_images/fp_dist.png

And a firefox window opens retrieving and interactive plot:

../../_images/chemical_space.png

Most Similars n Molecules

The search_most_similars method will output the n molecules from your database most similar to your query molecule

from analogs_finder.search_methods import methods as mt
from analogs_finder.helpers import helpers as hp

output = "most_similars.sdf"
n_structs = 50

similars  = mt.search_most_similars(molecule_query, molecules_db, n_structs)
similars_no_duplicates = hp.remove_duplicates(similars)

w = Chem.SDWriter(output)
for m in similars_no_duplicates: w.write(m.molecule)

Use different fingerprints

molecule_query = next(Chem.SDMolSupplier("examples/query_molecule.sdf"))
substructure_file = "examples/substructure.sdf"

similars_daylight = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="DL")
similars_circular = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="circular")
similars_torsions = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="torsions")
similars_MACCS = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="MACCS")
similars_pharm = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="pharm")

Use all four fingerprints to query one database with different tresholds

tresholds = [0.7, 0.4, 0.4, 0.6]
fp_types = ["DL", "circular", "torsions", "MACCS"]
similarts = mt.search_similarity_tresh_several_fp(molecule_query, molecules_db, tresholds=tresholds, fp_types=fp_types)

Turbo search method:

Instead of just querying the reference molecule and setting a tanimoto treshold, we first look for the N most similar neighbours and we run similarity search with the reference molecule and theses neghbours, finally performing data fusion.

For more details: https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.10037

import analogs_finder.search_methods.fusion as fs
turbo_similars = fs.turbo_similarity(molecule_query, molecules_db, neighbours=5, treshold=0.4, fp_type="circular")

PostFilter by:

To postfilter a previously done analog search provide the sdf of the previous analog search as the database followed by -only_postprocess. Here, we remove duplicates of the previous analog serch resut_search.sdf

Position of growing

To only keep the molecules that have a radical growin in a specific initial atom (atom 2 for example) use the command below. To know the indexes of the atoms you can select them with maestro/pymol with the option of labeling atoms by index.

python -m analogs_finder.main result_search.sdf substructre_1.sdf --only_postfilter --atom_to_grow 2

It is also possible to sum them up. For intance, here we keep analogs grown by the atom index 2 or 4

python -m analogs_finder.main result_search.sdf substructre_1.sdf --only_postfilter --atom_to_grow 2 4

To keep analogs that have no radical in a specific atom use –atom_to_avoid. For example, next we keep all molecules that have no radical in the atom index 2 and 4

python -m analogs_finder.main result_search.sdf substructre_1.sdf --only_postfilter --atom_to_avoid 2 4