Fragment libraries evaluation with fragPELE

Introduction

FragPELE is able to grow fragments from custom-made libraries. All you have to do is provide it with a directory containing your fragments (in PDB or SDF format) and a docked core to let the software rank your fragments and return the best binding modes for each.

Inputs

  • protein-scaffold PDB file

  • directory with fragment files (SDF or PDB)

  • input.yaml configuration file

Default parameters

  • frag_eq_steps: 20

  • gr_steps: 6

  • frag_steps: 3

Recommendations

  1. Computational time is estimated between 30min and 3h per fragment (depending on the number of rotatable bonds).

1. Complex Preparation

Protein-scaffold PDB file

The PDB file should be preprocessed with Maestro Protein Preparation Wizard. We usually recommend protonating the protein (obligatory), deleting water molecules more than 5Å away from ligands and ions as well as filling in missing loops and side chains.

The file must contain protein in complex with a fragment (scaffold), e.g. an X-ray structure or a docked pose. Additionally, ensure the ligand has:

  • unique chain ID

  • no atom names with spaces or single letters (occasionally Maestro adds hydrogens named H1 2, these need to be corrected)

  • any unique residue name, except for UNK

  • well-defined aromatic bonds.

Directory with fragments

The fragment library should be preprocessed using Schrodinger LigPrep (default settings should be sufficient) and have unique molecule names.

2. Input Preparation

Prepare input.yaml file:

frag_core: "complex.pdb"  # Protein-scaffold complex
frag_library: "/home/username/aromatic_library"  # Folder containing fragment files
resname: "LIG"  # Ligand scaffold residue name
chain_core: "L"  # Ligand scaffold chain ID
frag_core_atom: "C6-H6"  # Connection point from which you want to grow the fragments
cpus: 48

Optional Parameters

  • frag_restart_libraries: To resume a fragment libraries simulation.

  • fragment_atom: Select a hydrogen of the fragment that we want to attach to the frag_core_atom, instead of performing a simulation from every hydrogen of the fragment.

Defining frag_core_atom

Not sure how to define frag_core_atom? Open your PDB file in Schrodinger Maestro and hover over the atoms to see their PDB names on the panel below. The starting point for fragment growing should consist of a heavy atom and a hydrogen, e.g. “C6-H6” on the picture below.

../../_images/libraries_2.png

3. Run simulation

To run the system launch the simulation with the following command:

python -m pele_platform.main input.yml

4. Output

The simulation will create a TSV file with scored fragments as well as a number of fragment folders, the names of which will consist of the scaffold and molecule names.

Scored fragments

The list of all grown fragments together with their associated scores (average binding energy of the top 25% of all poses) can be found in:

simulation_score.tsv

Top poses

Each fragment folder contains a top_results folder with PDB files corresponding to the best poses for that fragment:

fragment_folder/top_results/

5. Analysis

Once your simulation is finished, you can run an automated analysis to retrieve most promising fragments based on their proximity to a certain point, for example a side chain atom you would like your fragment to interact with.

All you have to do is adjust your input.yaml and launch it as before. Ensure you:

  • provide PELE with the path to your working folder, i.e. where you launched your simulation in the first place

  • set only_analysis: true

  • set your desired coordinates in analysis_to_point.

Example:

frag_core: "complex.pdb"  # Protein-scaffold complex
frag_library: "/home/username/aromatic_library"  # Folder containing fragment files
resname: "LIG"  # Ligand scaffold residue name
chain_core: "L"  # Ligand scaffold chain ID
frag_core_atom: "C6-H6"  # Connection point from which you want to grow the fragments
only_analysis: true
analysis_to_point: [24.835,  73.161,  17.026]
working_folder: "/home/username/LIG_Pele"
cpus: 48

The script will produce a point_analysis.csv file containing a number of columns:

  • File: Working folder

  • BestFileDistance: File with the best distance

    • Distance: Best distance to the point

  • BestFileBE: File with the best energy of the simulation

    • BE: Best energy

  • BestFileNormalization: File with the best distance and energy of the simulation (recommended)

    • BestNormalization: Best sqrt(distance^2 + energy^2)

    • DistanceBestNormalization: Best distance of the file above

    • BEBestNormalization: Best energy of the file above