Wiki source code of Distance-Fluctuations
Last modified by emacasali on 2022/09/15 13:34
Show last authors
| author | version | line-number | content |
|---|---|---|---|
| 1 | (% class="jumbotron" %) | ||
| 2 | ((( | ||
| 3 | (% class="container" %) | ||
| 4 | ((( | ||
| 5 | = Distance Fluctuation (DF) Analysis = | ||
| 6 | |||
| 7 | Giorgio Colombo Group (UNIPV) | ||
| 8 | ))) | ||
| 9 | ))) | ||
| 10 | |||
| 11 | (% class="row" %) | ||
| 12 | ((( | ||
| 13 | (% class="col-xs-12 col-sm-8" %) | ||
| 14 | ((( | ||
| 15 | = What DF matrices are? = | ||
| 16 | |||
| 17 | The analysis of the results of a MD simulation can performed using the Distance Fluctuation matrices (DF), based on the Coordination Propensity hypothesis: | ||
| 18 | |||
| 19 | (% style="text-align:center" %) | ||
| 20 | [[image:CodeCogsEqn-58.png||height="34" width="181"]] | ||
| 21 | |||
| 22 | low CP values, corresponding to low pair-distance fluctuations, highlight groups of residues that move in a mechanically coordinated way. | ||
| 23 | |||
| 24 | |||
| 25 | = How to use the script = | ||
| 26 | |||
| 27 | • __Requisites__ | ||
| 28 | |||
| 29 | - Python 3.0 (or newer version) | ||
| 30 | |||
| 31 | - Numpy | ||
| 32 | |||
| 33 | - Scipy | ||
| 34 | |||
| 35 | |||
| 36 | • __Usage__ | ||
| 37 | |||
| 38 | The script can analyze a MD trajectory and identify the coordinated motions between residues. It can then filter the output matrix based on the distance to identify long-range coordinated motions. | ||
| 39 | |||
| 40 | The script can work both using only C-alphas (using either a pdb or a xyz file) or the sidechains (using a pdb file). | ||
| 41 | |||
| 42 | For more information run: | ||
| 43 | |||
| 44 | {{{ python3 distance_fluctuation.py -h}}} | ||
| 45 | |||
| 46 | |||
| 47 | {{{optional arguments: | ||
| 48 | -h, --help show this help message and exit | ||
| 49 | |||
| 50 | Required arguments: | ||
| 51 | -ext {pdb,xyz}, --file_ext {pdb,xyz} | ||
| 52 | Input trajectory file iformat (options: .pdb or .xyz) | ||
| 53 | -n N_AA, --n_aa N_AA Number of aminoacids in each frame of the .pdb | ||
| 54 | trajectory | ||
| 55 | |||
| 56 | Optional arguments: | ||
| 57 | -i IN_NAME, --in_name IN_NAME | ||
| 58 | Name of the trajectory file (default: trj) | ||
| 59 | -s S_FRAME, --s_frame S_FRAME | ||
| 60 | Index of the initial frame (default = 0) | ||
| 61 | -e E_FRAME, --e_frame E_FRAME | ||
| 62 | Index of the last frame (default = end) | ||
| 63 | -c CUTOFF, --cutoff CUTOFF | ||
| 64 | Cutoff for the distance fluctuation analysis (default = 5) | ||
| 65 | -t TOLERANCE, --tolerance TOLERANCE | ||
| 66 | Tolerance for the distance fluctuation analysis (default = 0) | ||
| 67 | -p {all,c-a}, --pdb_type {all,c-a} | ||
| 68 | Type of pdb file submitted (only C-alpha, all protein atoms) | ||
| 69 | -l {s,seq,v,volume}, --local {s,seq,v,volume} | ||
| 70 | Type of local cutoff used, sequence or volume (default = seq) | ||
| 71 | -r RADIUS, --radius RADIUS | ||
| 72 | Radius of the area cutoff (default = 7A) | ||
| 73 | -rs RES_START, --res_start RES_START | ||
| 74 | Starting residue if the PDB does not start with res.num. 1 | ||
| 75 | -ro RMS_OUT, --rms_out RMS_OUT | ||
| 76 | RMS distance output filename (without extension) | ||
| 77 | -ao AVG_OUT, --avg_out AVG_OUT | ||
| 78 | Average distance output filename (without extension) | ||
| 79 | -so SEQ_OUT, --seq_out SEQ_OUT | ||
| 80 | Profile sequence output filename (without extension) | ||
| 81 | -da DIST_AN, --dist_an DIST_AN | ||
| 82 | Distance analysis output filename (without extension) | ||
| 83 | --blocks BLOCKS File with domains borders | ||
| 84 | |||
| 85 | -rb RMS_B_OUT, --rms_b_out RMS_B_OUT | ||
| 86 | RMS distance blocks output filename}}} | ||
| 87 | |||
| 88 | |||
| 89 | ))) | ||
| 90 | |||
| 91 | |||
| 92 | (% class="col-xs-12 col-sm-4" %) | ||
| 93 | ((( | ||
| 94 | {{box title="**Contents**"}} | ||
| 95 | {{toc/}} | ||
| 96 | |||
| 97 | |||
| 98 | {{/box}} | ||
| 99 | |||
| 100 | |||
| 101 | ))) | ||
| 102 | ))) | ||
| 103 | |||
| 104 | • __How to read the output__ | ||
| 105 | |||
| 106 | The script generate different output file. | ||
| 107 | |||
| 108 | |||
| 109 | A) Average Distance (avgdist) file: | ||
| 110 | |||
| 111 | The name of the file is avgdist_out_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains) | ||
| 112 | |||
| 113 | The file contains a matrix using the residue indexes as axes and the average value of the distance between the residues as the data (r1 r2 avgdist). | ||
| 114 | |||
| 115 | The distance is calculated as the average of the euclidean distance between the residues. | ||
| 116 | |||
| 117 | |||
| 118 | B) Distance Fluctuation (rmsdist) file: | ||
| 119 | |||
| 120 | The name of the file is rmsdist_out_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
| 121 | |||
| 122 | The file contains a matrix using the residue indexes as axes and the distance fluctuation between the residues as the data (r1 r2 rmsdist). | ||
| 123 | |||
| 124 | The distance fluctuation is calculated for the residues that are at least 3 residues away from each other (x-2 to x+2) as follow: | ||
| 125 | |||
| 126 | 1) Calculate the average euclidean distance between the residues (either CA or center of mass) | ||
| 127 | |||
| 128 | 2) Calculate the average distance vector | ||
| 129 | |||
| 130 | 3) Substract the distance fluctuation to the average distance | ||
| 131 | |||
| 132 | 4) Calculate the power of the difference between distance fluctuation and local fluctuation | ||
| 133 | |||
| 134 | 5) Filter the values of the close residues (1-x, x, 1+x) | ||
| 135 | |||
| 136 | 6) Divide the obtained value for the number of frames | ||
| 137 | |||
| 138 | |||
| 139 | C) Profile Sequence (profile_sequence) file: | ||
| 140 | |||
| 141 | The name of the file is profile_sequence_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
| 142 | |||
| 143 | The file is a vector containing the residue number and the local flucutation value. | ||
| 144 | |||
| 145 | The local fluctuation is calculated as the average fluctuation of the residues close to each other (that is the residues ranging from x-2 to x+2 with x = residue number). | ||
| 146 | |||
| 147 | Since the value contains the average distance fluctuation for a range of residues the output starts from residue 4 and ends at residue n-3). | ||
| 148 | |||
| 149 | |||
| 150 | D) Distance Analysis (distance_analysis_out) file: | ||
| 151 | |||
| 152 | The name of the file is distance_analysis_out_c_x_y_type.dat with c = cutoff value, x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
| 153 | |||
| 154 | This fail contains the distance fluctuation filtered by the cutoff value (the value are kept if the distance fluctuation is smaller than the cutoff value). | ||
| 155 | |||
| 156 | The cutoff value is calculated as the sum between the nearcutoff and the tolerance value specified by the user. | ||
| 157 | |||
| 158 | The nearcutoff can be calcolated either using sequence proximity or 3d proximity and can be specified by command line using the option -l or ~-~-local (see the command line help for further details). | ||
| 159 | |||
| 160 | In the case of sequence proximity it is calculated as the average of the distance fluctuation value for residues in the range of x-2,x+2 (x = residue index) | ||
| 161 | |||
| 162 | In the case of 3d proximity is calculated as the average distance fluctuation of the residues within a certain radius from the current one (default value = 6.5 A) | ||
| 163 | |||
| 164 | |||
| 165 | E) Blocks Averaging: (rmsdist_out_b) | ||
| 166 | |||
| 167 | It is possible to obtain a distance fluctuation matrix average on protein domains ( or blocks). | ||
| 168 | |||
| 169 | The script requires an input file with the borders of the protein blocks defined by the user, in the form of two columns: the first with the lower limit and the second with the upper limit. | ||
| 170 | |||
| 171 | |||
| 172 | = References = | ||
| 173 | |||
| 174 | Morra, G.; Potestio, R.; Micheletti, C.; Colombo, G., Corresponding Functional Dynamics across the Hsp90 Chaperone Family: Insights from a Multiscale Analysis of MD Simulations, PLOS Computational Biology 8(3): e1002433. [[https:~~/~~/doi.org/10.1371/journal.pcbi.1002433>>url:https://doi.org/10.1371/journal.pcbi.1002433]] |