Analysis parameters
-------------------

These are parameters to set up the analysis package of the Platform.

List of aquaPELE parameters:

    1. `analyse <#analyse>`__
    2. `only_analysis <#only-analysis>`__
    3. `bandwidth <#bandwidth>`__
    4. `max_top_clusters <#max-top-clusters>`__
    5. `top_clusters_criterion <#top-clusters-criterion>`__
    6. `cluster_representatives_criterion <#cluster-representatives-criterion>`__
    7. `max_top_poses <#max-top-poses>`__
    8. `min_population <#min-population>`__
    9. `clustering_coverage <#clustering-coverage>`__

List of examples:

    - `Example 1 <#example-1>`__
    - `Example 2 <#example-2>`__
    - `Example 3 <#example-3>`__


analyse
++++++++

    - Description: Whether to run or not the analysis at the end of the
      simulation.
    - Type: ``Boolean``
    - Default: ``True``

    .. seealso::
      `only_analysis <#only-analysis>`__,
      `Example 1 <#example-1>`__


only_analysis
+++++++++++++

    - Description: Analyze an existing PELE simulation without running a
      new one from scratch.
    - Type: ``Boolean``
    - Default: ``False``

    .. note::
      We recommend adding the path to the simulation you want to analyse using the working_folder flag.

    .. seealso::
      `analyse <#analyse>`__,
      `Example 2 <#example-2>`__


bandwidth
+++++++++

    - Description: Sets the cluster size for the clustering algorithm.
      If it is set to "auto" the software automatically will choose a
      right value
    - Type: ``Float`` or ``auto``
    - Default: ``auto``

    .. note::
       Note that ``auto`` mode will only run when using the Mean Shift
       algorithm as clustering method. Although it is the default method
       and the only one that is recommended, there are other clustering
       methods implemented in the Platform
       (check `advanced parameters <../advanced.html>`__ to get further
       information).

    .. note::
       When it is set to ``auto``, it will select the best bandwidth value
       to cover the percentage of all explored points that is set with the
       ``clustering_coverage`` parameter.


    .. seealso::
      `clustering_coverage <#clustering-coverage>`__,
      `Example 2 <#example-2>`__


max_top_clusters
++++++++++++++++

    - Description: Sets the maximum number of clusters to be selected
      as top.
    - Type: ``Integer``
    - Default: ``8``

    .. seealso::
      `Example 2 <#example-2>`__


top_clusters_criterion
++++++++++++++++++++++

    - Description: Sets the method of selecting top clusters, we can
      choose one of:
        - ``total_25_percentile`` - total energy 25th percentile
        - ``total_5_percentile`` - total energy 5th percentile
        - ``total_mean`` - total energy mean
        - ``total_min`` - total energy min
        - ``interaction_25_percentile`` - interaction energy 25th percentile
        - ``interaction_5_percentile`` - interaction energy 5th percentile
        - ``interaction_mean`` - interaction energy mean
        - ``interaction_min`` - interaction energy min
        - ``population`` - cluster population
    - Type: ``String``
    - Default: ``interaction_25_percentile``

    .. seealso::
      `Example 2 <#example-2>`__


cluster_representatives_criterion
+++++++++++++++++++++++++++++++++

    - Description: Sets method of selecting representative structures for each
      cluster, you can choose one of:
        - ``total_25_percentile`` - total energy 25th percentile
        - ``total_5_percentile`` - total energy 5th percentile
        - ``total_mean`` - total energy mean
        - ``total_min`` - total energy min
        - ``interaction_25_percentile`` - interaction energy 25th percentile
        - ``interaction_5_percentile`` - interaction energy 5th percentile
        - ``interaction_mean`` - interaction energy mean
        - ``interaction_min`` - interaction energy min
    - Type: ``String``
    - Default: ``interaction_min``

    .. seealso::
      `Example 2 <#example-2>`__


max_top_poses
+++++++++++++

    - Description: Sets the maximum number of top poses to be retrieved.
    - Type: ``Integer``
    - Default: ``100``

    .. seealso::
      `Example 2 <#example-2>`__


min_population
++++++++++++++

    - Description: Sets the minimum population that selected clusters
      must fulfil. It takes a value between 0 and 1. The default value
      of 0.01 implies that all selected clusters need to have a population
      above 1% of the total amount of sampled poses.
    - Type: ``Float``
    - Default: ``0.01``

    .. seealso::
      `Example 2 <#example-2>`__


clustering_coverage
+++++++++++++++++++

    - Description: Sets the minimum percentage of points that needs to be
      assigned to a top cluster when running mean shift clustering with
      automated ``bandwidth``. Thus, clustering bandwidth will keep
      increasing once covering the coverage percentage that is defined.
    - Type: ``Float``
    - Default: ``0.75``

    .. note::
       Note that this parameter is only used when the ``auto`` ``bandwidth``
       mode is set.

    .. seealso::
      `bandwidth <#bandwidth>`__,
      `Example 3 <#example-3>`__


Example 1
+++++++++

In this example we set an induced fit docking simulation with 30 computation
cores. Besides, we disable the analysis package so the simulation will run
but it will not be analyzed.

..  code-block:: yaml

    # Required parameters
    system: 'system.pdb'
    chain: 'L'
    resname: 'LIG'

    # General parameters
    cpus: 30
    seed: 2021

    # Package selection
    induced_fit_fast: True

    # Analysis parameters
    analyse: False


Example 2
+++++++++

In this example we set an induced fit docking simulation with 30 computation
cores. However, instead of running the whole simulation from scratch, we
ask the analyze an existing simulation with the ``only_analysis`` option.
It is a useful feature when we want to reanalyze a previous simulation
changing some parameters, like shown below.

..  code-block:: yaml

    # Required parameters
    system: 'system.pdb'
    chain: 'L'
    resname: 'LIG'

    # General parameters
    cpus: 30
    seed: 2021

    # Package selection
    induced_fit_fast: True

    # Analysis parameters
    only_analysis: True
    bandwidth: 8
    max_top_clusters: 12
    top_clusters_criterion: "population"
    cluster_representatives_criterion: "interaction_mean"
    max_top_poses: 20
    min_population: 0.005


Example 3
+++++++++

In this example we set an induced fit docking simulation with 30 computation
cores. For the analysis, we rely on the default bandwidth parameter, which
is ``auto``. This option finds the right clustering ``bandwidth`` for the
Mean Shift algorithm according to the ``clustering_coverage``. Thus, the
right ``bandwidth`` is selected to include inside top cluster selection, at
least, the percentage of points that is supplied with
the ``clustering_coverage`` parameter.

..  code-block:: yaml

    # Required parameters
    system: 'system.pdb'
    chain: 'L'
    resname: 'LIG'

    # General parameters
    cpus: 30
    seed: 2021

    # Package selection
    induced_fit_fast: True

    # Analysis parameters
    only_analysis: True
    working_folder: "LIG_Pele"
    clustering_coverage: 0.60
    bandwidth: 8
    max_top_clusters: 12
    top_clusters_criterion: "population"
    cluster_representatives_criterion: "interaction_mean"
    max_top_poses: 20
    min_population: 0.005