API¶
Analysis¶
We included an option to run analysis as API for those of our users, who are familiar with Python. All you have to do is initialize the Analysis class with the three mandatory parameters (resname, chain and simulation_output) and any optional you might want to include.
Documentation¶
-
class
pele_platform.analysis.
Analysis
(simulation_output, resname=None, chain=None, be_column=4, limit_column=None, traj='trajectory.pdb', report=None, skip_initial_structures=True, kde=False, kde_structs=1000, topology=None, cpus=1, water_ids_to_track=[], plot_filtering_threshold=0.02, clustering_filtering_threshold=0.25, random_seed=None, clustering_coverage=0.75)¶ General class to manage all analysis operations.
-
classmethod
from_parameters
(parameters)¶ It initializes an Analysis object from a Parameters object.
- Parameters
parameters (a Parameters object) – The Parameters object containing the parameters that belong to the simulation
- Returns
analysis – The Analysis object obtained from the parameters that were supplied
- Return type
an Analysis object
-
generate
(path, clustering_type='meanshift', bandwidth='auto', analysis_nclust=10, max_top_clusters=8, top_clusters_criterion='interaction_25_percentile', min_population=0.01, max_top_poses=100, representatives_criterion='interaction_min')¶ It runs the full analysis workflow (plots, top poses and clusters) and saves the results in the supplied path.
- Parameters
path (str) – The path where the analysis results will be saved
clustering_type (str) – The clustering method that will be used to generate the clusters. One of [‘gaussianmixture’, ‘meanshift’, ‘hdbscan’]. Default is ‘meanshift’
bandwidth (float) – Bandwidth for the mean shift and HDBSCAN clustering. Default is 2.5
analysis_nclust (int) – Number of clusters to create when using the Gaussian mixture model. Default is 10
max_top_clusters (int) – Maximum number of clusters to return. Default is 8
min_population (float) – The minimum amount of structures in a cluster, takes a value between 0 and 1. Default is 0.01 (i.e. 1%)
max_top_poses (int) – Number of top poses to retrieve. Default = 100.
top_clusters_criterion (str) – Criterion to select top clusters. Default is “interaction_25_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”, “population”]
representatives_criterion (str) – Criterion to select cluster representative structures. Default is “interaction_5_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”]
-
generate_clusters
(path, clustering_type, bandwidth='auto', analysis_nclust=10, max_top_clusters=8, top_clusters_criterion='interaction_25_percentile', min_population=0.01, representatives_criterion='interaction_min')¶ It generates the structural clustering of ligand poses.
- Parameters
path (str) – The path where the clusters will be saved
clustering_type (str) – The clustering method that will be used to generate the clusters
bandwidth (Union[float, str]) – Bandwidth for the mean shift and HDBSCAN clustering. Default is 2.5
analysis_nclust (int) – Number of clusters to create when using the Gaussian mixture model. Default is 10
max_top_clusters (int) – Maximum number of clusters to return. Default is 8
top_clusters_criterion (str) – Criterion to select top clusters. Default is “interaction_25_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “population”]
min_population (float) – The minimum amount of structures in a cluster, takes a value between 0 and 1. Default is 0.01 (i.e. 1%)
representatives_criterion (str) – Criterion to select cluster representative structures. Default is “interaction_5_percentile”. One of [“total_25_percentile”, “total_5_percentile”, “total_mean”, “total_min”, “interaction_25_percentile”, “interaction_5_percentile”, “interaction_mean”, “interaction_min”]
-
generate_plots
(path)¶ It generates the plots.
- Parameters
path (str) – The path where the plots will be saved
-
generate_report
(plots_path, poses_path, clusters_path, best_metrics, filename)¶ It generates the final simulation report as a PDF file.
- Parameters
plots_path (str) – The path where the plots are saved
poses_path (str) – The path where the top poses are saved
clusters_path (str) – The path where the clusters are saved
best_metrics (list[float]) – The list that contains the metrics belonging to the extracted best poses
filename (str) – The filename for the simulation report
-
generate_top_poses
(path, n_poses)¶ It selects and saves the top poses.
- Parameters
path (str) – The path where the top poses will be saved
n_poses (int) – The number of top poses to retrieve
- Returns
best_metrics – The list that contains the metrics belonging to the extracted best poses
- Return type
list[float]
-
get_dataframe
(threshold=0.02)¶ - Parameters
threshold (float) – The ratio of high-energy entries that will be filtered out. Default is None and will be initialized with a threshold of 0.02
- Returns
dataframe – The dataframe containing the information from PELE reports
- Return type
a pandas.DataFrame object
-
property
parameters
¶ It returns the attributes of this Analysis object as a dictionary.
- Returns
params – A dictionary of parameters
- Return type
dict
-
save_params_to_file
(path, generate_params)¶ Saves all parameters used to generated Analysis to a TXT file.
- Parameters
path (str) – Path to directory where the file should be saved.
generate_params (dict) – Symbol table from generate() method.
-
classmethod
Example¶
To start off, you need to initialize Analysis, providing at least the following three parameters:
residue name
chain ID
path to the output folder.
>> from pele_platform.analysis import Analysis
>> analysis = Analysis(resname="LIG", chain="Z", simulation_output="LIG_Pele/output")
Once it has been generated, you can call any of the available methods to get the pieces of analysis that interest you or simply run the whole workflow, which includes extraction of top poses, plots, clustering and PDF report:
>> analysis.generate(path="my_folder", clustering_type="gaussianmixture", analysis_nclust=3)