API¶

Analysis¶

We included an option to run analysis as API for those of our users, who are familiar with Python. All you have to do is initialize the Analysis class with the three mandatory parameters (resname, chain and simulation_output) and any optional you might want to include.

Documentation¶

class pele_platform.analysis.Analysis(simulation_output, resname=None, chain=None, be_column=4, limit_column=None, traj='trajectory.pdb', report=None, skip_initial_structures=True, kde=False, kde_structs=1000, topology=None, cpus=1, water_ids_to_track=[], plot_filtering_threshold=0.02, clustering_filtering_threshold=0.25, random_seed=None, clustering_coverage=0.75)¶

General class to manage all analysis operations.

classmethod from_parameters(parameters)¶

It initializes an Analysis object from a Parameters object.

Parameters: parameters (a Parameters object) – The Parameters object containing the parameters that belong to the simulation
Returns: analysis – The Analysis object obtained from the parameters that were supplied
Return type: an Analysis object

generate(path, clustering_type='meanshift', bandwidth='auto', analysis_nclust=10, max_top_clusters=8, top_clusters_criterion='interaction_25_percentile', min_population=0.01, max_top_poses=100, representatives_criterion='interaction_min')¶

It runs the full analysis workflow (plots, top poses and clusters) and saves the results in the supplied path.

Parameters

path (str) – The path where the analysis results will be saved
clustering_type (str) – The clustering method that will be used to generate the clusters. One of [‘gaussianmixture’, ‘meanshift’, ‘hdbscan’]. Default is ‘meanshift’
bandwidth (float) – Bandwidth for the mean shift and HDBSCAN clustering. Default is 2.5
analysis_nclust (int) – Number of clusters to create when using the Gaussian mixture model. Default is 10
max_top_clusters (int) – Maximum number of clusters to return. Default is 8
min_population (float) – The minimum amount of structures in a cluster, takes a value between 0 and 1. Default is 0.01 (i.e. 1%)
max_top_poses (int) – Number of top poses to retrieve. Default = 100.
top_clusters_criterion (str) – Criterion to select top clusters. Default is “interaction_25_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”, “population”]
representatives_criterion (str) – Criterion to select cluster representative structures. Default is “interaction_5_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”]

generate_clusters(path, clustering_type, bandwidth='auto', analysis_nclust=10, max_top_clusters=8, top_clusters_criterion='interaction_25_percentile', min_population=0.01, representatives_criterion='interaction_min')¶

It generates the structural clustering of ligand poses.

Parameters

path (str) – The path where the clusters will be saved
clustering_type (str) – The clustering method that will be used to generate the clusters
bandwidth (Union[float, str]) – Bandwidth for the mean shift and HDBSCAN clustering. Default is 2.5
analysis_nclust (int) – Number of clusters to create when using the Gaussian mixture model. Default is 10
max_top_clusters (int) – Maximum number of clusters to return. Default is 8
top_clusters_criterion (str) – Criterion to select top clusters. Default is “interaction_25_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “population”]
min_population (float) – The minimum amount of structures in a cluster, takes a value between 0 and 1. Default is 0.01 (i.e. 1%)
representatives_criterion (str) – Criterion to select cluster representative structures. Default is “interaction_5_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”]

generate_plots(path)¶

It generates the plots.

Parameters: path (str) – The path where the plots will be saved

generate_report(plots_path, poses_path, clusters_path, best_metrics, filename)¶

It generates the final simulation report as a PDF file.

Parameters

plots_path (str) – The path where the plots are saved
poses_path (str) – The path where the top poses are saved
clusters_path (str) – The path where the clusters are saved
best_metrics (list[float]) – The list that contains the metrics belonging to the extracted best poses
filename (str) – The filename for the simulation report

generate_top_poses(path, n_poses)¶

It selects and saves the top poses.

Parameters

path (str) – The path where the top poses will be saved
n_poses (int) – The number of top poses to retrieve

Returns

best_metrics – The list that contains the metrics belonging to the extracted best poses

Return type

list[float]

get_dataframe(threshold=0.02)¶

Parameters: threshold (float) – The ratio of high-energy entries that will be filtered out. Default is None and will be initialized with a threshold of 0.02
Returns: dataframe – The dataframe containing the information from PELE reports
Return type: a pandas.DataFrame object

property parameters¶

It returns the attributes of this Analysis object as a dictionary.

Returns: params – A dictionary of parameters
Return type: dict

save_params_to_file(path, generate_params)¶

Saves all parameters used to generated Analysis to a TXT file.

Parameters

path (str) – Path to directory where the file should be saved.
generate_params (dict) – Symbol table from generate() method.

Example¶

To start off, you need to initialize Analysis, providing at least the following three parameters:

residue name

chain ID

path to the output folder.

>> from pele_platform.analysis import Analysis
>> analysis = Analysis(resname="LIG", chain="Z", simulation_output="LIG_Pele/output")

Once it has been generated, you can call any of the available methods to get the pieces of analysis that interest you or simply run the whole workflow, which includes extraction of top poses, plots, clustering and PDF report:

>> analysis.generate(path="my_folder", clustering_type="gaussianmixture", analysis_nclust=3)