From python API ================= Download examples ------------------- From command line: git clone https://github.com/NostrumBioDiscovery/analogs_finder.git Load your query molecule and your database -------------------------------------------- :: from rdkit import Chem database = "analogs_finder/examples/database.sdf" qmolecule = "analogs_finder/examples/substructre_1.sdf" molecules_db= Chem.SDMolSupplier(database) molecule_query = next(Chem.SDMolSupplier(qmolecule)) Analyze your dataset ----------------------- The command below will output the tanimoto similarity distribution among all dataset and all fingerprints, at the same time will show a plot of the two first components of the PCA over the fingerprint space coloured by similarity to your query molecule. If we hover the points of the plot we can inspect the different structures of the molecules. :: from analogs_finder.analysis import analysis_dataset as an #Use Uniform manifold to plot the chemical space an.main(molecule_query, molecules_db, dim_type="umap") #Use PCA to plot the chemical space an.main(molecule_query, molecules_db, dim_type="pca") We find the similarity_hist_DL.png: .. figure:: ../../images/fp_dist.png :scale: 80% :align: center And a firefox window opens retrieving and interactive plot: .. figure:: ../../images/chemical_space.png :scale: 80% :align: center Most Similars n Molecules -------------------------------------- The search_most_similars method will output the n molecules from your database most similar to your query molecule :: from analogs_finder.search_methods import methods as mt from analogs_finder.helpers import helpers as hp output = "most_similars.sdf" n_structs = 50 similars = mt.search_most_similars(molecule_query, molecules_db, n_structs) similars_no_duplicates = hp.remove_duplicates(similars) w = Chem.SDWriter(output) for m in similars_no_duplicates: w.write(m.molecule) Tanimoto Similarity Search ------------------------------ The search_similarity_tresh method will output all molecules that have a tanimoto similarity higher than a desired treshold :: treshold = 0.6 similars = mt.search_similarity_tresh(molecule_query, molecules_db, treshold) similars_no_duplicates = hp.remove_duplicates(similars) w = Chem.SDWriter(output) for m in similars_no_duplicates: w.write(m.molecule) Substructure Search ----------------------- The search_substructure will output molecules with at least one of the substructures on you query sdf file :: substructures = "analogs_finder/examples/substructre_2.sdf" molecule_query = Chem.SDMolSupplier(substructures) similars = mt.search_substructure(molecule_query, molecules_db) similars_no_duplicates = hp.remove_duplicates(similars) w = Chem.SDWriter(output) for m in similars_no_duplicates: w.write(m.molecule) Combinatorial Substructure Search --------------------------------------- The combi_substructure_search will output all molecules with at least one substructures of each of the inputted substructures sdf files For example: I could look for structures with a 6 and 5 memeber ring, so I will pass this two substructures in a sdf so at least one of them have to be in the outputted molecules. But, at the same time I also want to have an amide so I will pass another sdf file with the amide substructure. Finally, I will obtain structures with an amide and either a 5 or 6 memebr ring :: import glob substructures_sdf = glob.glob("analogs_finder/examples/subs*.sdf") similars = mt.combi_substructure_search(substructures_sdf, molecules_db) similars_no_duplicates = hp.remove_duplicates(similars) w = Chem.SDWriter(output) for m in similars_no_duplicates: w.write(m.molecule) Similarity and Substructure hybrid search ------------------------------------------ The most_similar_with_substructure method will output molecules with a tanimoto similarity coefficient higher than certain treshold that also contain certain substructure :: substructure_file = "analogs_finder/examples/substructre_3.sdf" similars = mt.most_similar_with_substructure(molecule_query, molecules_db, substructure_file, treshold=0.1) w = Chem.SDWriter(output) for m in similars_no_duplicates: w.write(m.molecule) Use different fingerprints ------------------------------ :: molecule_query = next(Chem.SDMolSupplier("examples/query_molecule.sdf")) substructure_file = "examples/substructure.sdf" similars_daylight = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="DL") similars_circular = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="circular") similars_torsions = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="torsions") similars_MACCS = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="MACCS") similars_pharm = mt.search_most_similars(molecule_query, molecules_db, 2, fp_type="pharm") Use all four fingerprints to query one database with different tresholds ------------------------------------------------------------------------------- :: tresholds = [0.7, 0.4, 0.4, 0.6] fp_types = ["DL", "circular", "torsions", "MACCS"] similarts = mt.search_similarity_tresh_several_fp(molecule_query, molecules_db, tresholds=tresholds, fp_types=fp_types) Turbo search method: ---------------------- Instead of just querying the reference molecule and setting a tanimoto treshold, we first look for the N most similar neighbours and we run similarity search with the reference molecule and theses neghbours, finally performing data fusion. For more details: https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.10037 :: import analogs_finder.search_methods.fusion as fs turbo_similars = fs.turbo_similarity(molecule_query, molecules_db, neighbours=5, treshold=0.4, fp_type="circular") PostFilter by: ---------------- To postfilter a previously done analog search provide the sdf of the previous analog search as the database followed by -only_postprocess. Here, we remove duplicates of the previous analog serch resut_search.sdf Position of growing +++++++++++++++++++++++ To only keep the molecules that have a radical growin in a specific initial atom (atom 2 for example) use the command below. To know the indexes of the atoms you can select them with maestro/pymol with the option of labeling atoms by index. :: python -m analogs_finder.main result_search.sdf substructre_1.sdf --only_postfilter --atom_to_grow 2 It is also possible to sum them up. For intance, here we keep analogs grown by the atom index 2 **or** 4 :: python -m analogs_finder.main result_search.sdf substructre_1.sdf --only_postfilter --atom_to_grow 2 4 To keep analogs that have no radical in a specific atom use --atom_to_avoid. For example, next we keep all molecules that have no radical in the atom index 2 **and** 4 :: python -m analogs_finder.main result_search.sdf substructre_1.sdf --only_postfilter --atom_to_avoid 2 4