Structure Filtering Tutorial ============================= This tutorial aims to guide the user to perform a database filtering with the possibility of specifying which R-groups we want fixed. Flags and Input Files ~~~~~~~~~~~~~~~~~~~~~ -i TEMPLATE_LIGAND, –template_ligand –> Path to PDB template ligand. -l LIGANDS, –ligands –> Path to SDF file with database ligands or folder with SDF files. -o OUTFILE, –outfile –> Output file name. -a ATOM_LINKER, –atom_linker –> PDB atom name of core that is bound to R-group. Requirements of the input files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. The file of the substructure we want to have in all the molecules, must be in PDB format and have **unique PDB atom names**. 2. The database files must be SDF format. Example - Running in a Jupyter Notebook --------------------------------------- .. code:: ipython3 import database_filtering from database_filtering.utils.utils import filter_mols import rdkit from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import Draw from rdkit.Chem.Draw import rdMolDraw2D from rdkit.Chem.Draw import IPythonConsole IPythonConsole.molSize = 300,300 .. code:: ipython3 template_ligand = Chem.MolFromSmiles("O=C(O)c1ccccc1") template_ligand .. figure:: ../../_images/DF_tutorial-input_ligand.png :scale: 40% :align: center .. code:: ipython3 ligands = [Chem.MolFromSmiles("O=C(O)c1ccccc1CO"), Chem.MolFromSmiles("Nc1ccccc1C(=O)O"), Chem.MolFromSmiles("Nc1ccccc1C(=O)O"), Chem.MolFromSmiles("O=C(O)c1ccccc1I"), Chem.MolFromSmiles("O=C(O)c1ccccc1Br"), Chem.MolFromSmiles("NC(=O)c1ccccc1C(=O)O"), Chem.MolFromSmiles("O=[IH2]c1ccccc1C(=O)O"), Chem.MolFromSmiles("Nc1ncccc1C(=O)O"), Chem.MolFromSmiles("Nc1ccc(C(=O)O)c(O)c1")] Draw.MolsToGridImage(ligands,molsPerRow=4,subImgSize=(200,200)) .. figure:: ../../_images/DF_tutorial-ligands.png :scale: 40% :align: center .. code:: ipython3 linker = ['C7'] template_ligand.__sssAtoms = [8] # Highlight the atom C7 template_ligand .. parsed-literal:: REMEMBER: If your linker is atom C7, we will only obtain molecules that have an R-group bound to that atom. Also, we can select more than one linker atom: In a notebook --> linker = ["C7", "C4"] Running on the cluster --> -a C7 C8 .. figure:: ../../_images/DF_tutorial-linker_atom.png :scale: 40% :align: center .. code:: ipython3 template_ligand_path = "./template_ligand.pdb" ligands_path = "./ligands.sdf" .. code:: ipython3 filter_mols(template_ligand_path, ligands_path,'test',linker) # Results are stored in the file test.sdf .. parsed-literal:: Filtering passed for molecule Filtering passed for molecule Filtering passed for molecule Filtering passed for molecule Filtering passed for molecule Filtering passed for molecule Filtering passed for molecule No substructure match for ligand , skipping Molecule did not meet the R-groups requirements. Filtering Results ~~~~~~~~~~~~~~~~~ .. code:: ipython3 Draw.MolsToGridImage(mols,molsPerRow=4,subImgSize=(200,200)) .. figure:: ../../_images/DF_tutorial-results.png :scale: 40% :align: center Running in the Cluster ---------------------- .. code:: sh #!/bin/bash #SBATCH -J filter #SBATCH --output=filter.out #SBATCH --error=filter.err #SBATCH --ntasks=3 #SBATCH --mem-per-cpu=10000 source /shared/home/hmartin/miniconda3/etc/profile.d/conda.sh conda activate /shared/home/hmartin/miniconda3/envs/r_groups_env python -m database_filtering.run_filtering -i template_ligand.pdb -l ligands.sdf -a C7 -o test