API

Analysis

We included an option to run analysis as API for those of our users, who are familiar with Python. All you have to do is initialize the Analysis class with the three mandatory parameters (resname, chain and simulation_output) and any optional you might want to include.

Documentation

class pele_platform.analysis.Analysis(simulation_output, resname=None, chain=None, be_column=4, limit_column=None, traj='trajectory.pdb', report=None, skip_initial_structures=True, kde=False, kde_structs=1000, topology=None, cpus=1, water_ids_to_track=[], plot_filtering_threshold=0.02, clustering_filtering_threshold=0.25, random_seed=None, clustering_coverage=0.75)

General class to manage all analysis operations.

classmethod from_parameters(parameters)

It initializes an Analysis object from a Parameters object.

Parameters

parameters (a Parameters object) – The Parameters object containing the parameters that belong to the simulation

Returns

analysis – The Analysis object obtained from the parameters that were supplied

Return type

an Analysis object

generate(path, clustering_type='meanshift', bandwidth='auto', analysis_nclust=10, max_top_clusters=8, top_clusters_criterion='interaction_25_percentile', min_population=0.01, max_top_poses=100, representatives_criterion='interaction_min')

It runs the full analysis workflow (plots, top poses and clusters) and saves the results in the supplied path.

Parameters
  • path (str) – The path where the analysis results will be saved

  • clustering_type (str) – The clustering method that will be used to generate the clusters. One of [‘gaussianmixture’, ‘meanshift’, ‘hdbscan’]. Default is ‘meanshift’

  • bandwidth (float) – Bandwidth for the mean shift and HDBSCAN clustering. Default is 2.5

  • analysis_nclust (int) – Number of clusters to create when using the Gaussian mixture model. Default is 10

  • max_top_clusters (int) – Maximum number of clusters to return. Default is 8

  • min_population (float) – The minimum amount of structures in a cluster, takes a value between 0 and 1. Default is 0.01 (i.e. 1%)

  • max_top_poses (int) – Number of top poses to retrieve. Default = 100.

  • top_clusters_criterion (str) – Criterion to select top clusters. Default is “interaction_25_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”, “population”]

  • representatives_criterion (str) – Criterion to select cluster representative structures. Default is “interaction_5_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”]

generate_clusters(path, clustering_type, bandwidth='auto', analysis_nclust=10, max_top_clusters=8, top_clusters_criterion='interaction_25_percentile', min_population=0.01, representatives_criterion='interaction_min')

It generates the structural clustering of ligand poses.

Parameters
  • path (str) – The path where the clusters will be saved

  • clustering_type (str) – The clustering method that will be used to generate the clusters

  • bandwidth (Union[float, str]) – Bandwidth for the mean shift and HDBSCAN clustering. Default is 2.5

  • analysis_nclust (int) – Number of clusters to create when using the Gaussian mixture model. Default is 10

  • max_top_clusters (int) – Maximum number of clusters to return. Default is 8

  • top_clusters_criterion (str) – Criterion to select top clusters. Default is “interaction_25_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “population”]

  • min_population (float) – The minimum amount of structures in a cluster, takes a value between 0 and 1. Default is 0.01 (i.e. 1%)

  • representatives_criterion (str) – Criterion to select cluster representative structures. Default is “interaction_5_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”]

generate_plots(path)

It generates the plots.

Parameters

path (str) – The path where the plots will be saved

generate_report(plots_path, poses_path, clusters_path, best_metrics, filename)

It generates the final simulation report as a PDF file.

Parameters
  • plots_path (str) – The path where the plots are saved

  • poses_path (str) – The path where the top poses are saved

  • clusters_path (str) – The path where the clusters are saved

  • best_metrics (list[float]) – The list that contains the metrics belonging to the extracted best poses

  • filename (str) – The filename for the simulation report

generate_top_poses(path, n_poses)

It selects and saves the top poses.

Parameters
  • path (str) – The path where the top poses will be saved

  • n_poses (int) – The number of top poses to retrieve

Returns

best_metrics – The list that contains the metrics belonging to the extracted best poses

Return type

list[float]

get_dataframe(threshold=0.02)
Parameters

threshold (float) – The ratio of high-energy entries that will be filtered out. Default is None and will be initialized with a threshold of 0.02

Returns

dataframe – The dataframe containing the information from PELE reports

Return type

a pandas.DataFrame object

property parameters

It returns the attributes of this Analysis object as a dictionary.

Returns

params – A dictionary of parameters

Return type

dict

save_params_to_file(path, generate_params)

Saves all parameters used to generated Analysis to a TXT file.

Parameters
  • path (str) – Path to directory where the file should be saved.

  • generate_params (dict) – Symbol table from generate() method.

Example

To start off, you need to initialize Analysis, providing at least the following three parameters:

  • residue name

  • chain ID

  • path to the output folder.

>> from pele_platform.analysis import Analysis
>> analysis = Analysis(resname="LIG", chain="Z", simulation_output="LIG_Pele/output")

Once it has been generated, you can call any of the available methods to get the pieces of analysis that interest you or simply run the whole workflow, which includes extraction of top poses, plots, clustering and PDF report:

>> analysis.generate(path="my_folder", clustering_type="gaussianmixture", analysis_nclust=3)