Wiki source code of Distance-Fluctuations
Last modified by emacasali on 2022/09/15 13:34
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | (% class="jumbotron" %) | ||
2 | ((( | ||
3 | (% class="container" %) | ||
4 | ((( | ||
5 | = Distance Fluctuation (DF) Analysis = | ||
6 | |||
7 | Giorgio Colombo Group (UNIPV) | ||
8 | ))) | ||
9 | ))) | ||
10 | |||
11 | (% class="row" %) | ||
12 | ((( | ||
13 | (% class="col-xs-12 col-sm-8" %) | ||
14 | ((( | ||
15 | = What DF matrices are? = | ||
16 | |||
17 | The analysis of the results of a MD simulation can performed using the Distance Fluctuation matrices (DF), based on the Coordination Propensity hypothesis: | ||
18 | |||
19 | (% style="text-align:center" %) | ||
20 | [[image:CodeCogsEqn-58.png||height="34" width="181"]] | ||
21 | |||
22 | low CP values, corresponding to low pair-distance fluctuations, highlight groups of residues that move in a mechanically coordinated way. | ||
23 | |||
24 | |||
25 | = How to use the script = | ||
26 | |||
27 | • __Requisites__ | ||
28 | |||
29 | - Python 3.0 (or newer version) | ||
30 | |||
31 | - Numpy | ||
32 | |||
33 | - Scipy | ||
34 | |||
35 | |||
36 | • __Usage__ | ||
37 | |||
38 | The script can analyze a MD trajectory and identify the coordinated motions between residues. It can then filter the output matrix based on the distance to identify long-range coordinated motions. | ||
39 | |||
40 | The script can work both using only C-alphas (using either a pdb or a xyz file) or the sidechains (using a pdb file). | ||
41 | |||
42 | For more information run: | ||
43 | |||
44 | {{{ python3 distance_fluctuation.py -h}}} | ||
45 | |||
46 | |||
47 | {{{optional arguments: | ||
48 | -h, --help show this help message and exit | ||
49 | |||
50 | Required arguments: | ||
51 | -ext {pdb,xyz}, --file_ext {pdb,xyz} | ||
52 | Input trajectory file iformat (options: .pdb or .xyz) | ||
53 | -n N_AA, --n_aa N_AA Number of aminoacids in each frame of the .pdb | ||
54 | trajectory | ||
55 | |||
56 | Optional arguments: | ||
57 | -i IN_NAME, --in_name IN_NAME | ||
58 | Name of the trajectory file (default: trj) | ||
59 | -s S_FRAME, --s_frame S_FRAME | ||
60 | Index of the initial frame (default = 0) | ||
61 | -e E_FRAME, --e_frame E_FRAME | ||
62 | Index of the last frame (default = end) | ||
63 | -c CUTOFF, --cutoff CUTOFF | ||
64 | Cutoff for the distance fluctuation analysis (default = 5) | ||
65 | -t TOLERANCE, --tolerance TOLERANCE | ||
66 | Tolerance for the distance fluctuation analysis (default = 0) | ||
67 | -p {all,c-a}, --pdb_type {all,c-a} | ||
68 | Type of pdb file submitted (only C-alpha, all protein atoms) | ||
69 | -l {s,seq,v,volume}, --local {s,seq,v,volume} | ||
70 | Type of local cutoff used, sequence or volume (default = seq) | ||
71 | -r RADIUS, --radius RADIUS | ||
72 | Radius of the area cutoff (default = 7A) | ||
73 | -rs RES_START, --res_start RES_START | ||
74 | Starting residue if the PDB does not start with res.num. 1 | ||
75 | -ro RMS_OUT, --rms_out RMS_OUT | ||
76 | RMS distance output filename (without extension) | ||
77 | -ao AVG_OUT, --avg_out AVG_OUT | ||
78 | Average distance output filename (without extension) | ||
79 | -so SEQ_OUT, --seq_out SEQ_OUT | ||
80 | Profile sequence output filename (without extension) | ||
81 | -da DIST_AN, --dist_an DIST_AN | ||
82 | Distance analysis output filename (without extension) | ||
83 | --blocks BLOCKS File with domains borders | ||
84 | |||
85 | -rb RMS_B_OUT, --rms_b_out RMS_B_OUT | ||
86 | RMS distance blocks output filename}}} | ||
87 | |||
88 | |||
89 | ))) | ||
90 | |||
91 | |||
92 | (% class="col-xs-12 col-sm-4" %) | ||
93 | ((( | ||
94 | {{box title="**Contents**"}} | ||
95 | {{toc/}} | ||
96 | |||
97 | |||
98 | {{/box}} | ||
99 | |||
100 | |||
101 | ))) | ||
102 | ))) | ||
103 | |||
104 | • __How to read the output__ | ||
105 | |||
106 | The script generate different output file. | ||
107 | |||
108 | |||
109 | A) Average Distance (avgdist) file: | ||
110 | |||
111 | The name of the file is avgdist_out_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains) | ||
112 | |||
113 | The file contains a matrix using the residue indexes as axes and the average value of the distance between the residues as the data (r1 r2 avgdist). | ||
114 | |||
115 | The distance is calculated as the average of the euclidean distance between the residues. | ||
116 | |||
117 | |||
118 | B) Distance Fluctuation (rmsdist) file: | ||
119 | |||
120 | The name of the file is rmsdist_out_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
121 | |||
122 | The file contains a matrix using the residue indexes as axes and the distance fluctuation between the residues as the data (r1 r2 rmsdist). | ||
123 | |||
124 | The distance fluctuation is calculated for the residues that are at least 3 residues away from each other (x-2 to x+2) as follow: | ||
125 | |||
126 | 1) Calculate the average euclidean distance between the residues (either CA or center of mass) | ||
127 | |||
128 | 2) Calculate the average distance vector | ||
129 | |||
130 | 3) Substract the distance fluctuation to the average distance | ||
131 | |||
132 | 4) Calculate the power of the difference between distance fluctuation and local fluctuation | ||
133 | |||
134 | 5) Filter the values of the close residues (1-x, x, 1+x) | ||
135 | |||
136 | 6) Divide the obtained value for the number of frames | ||
137 | |||
138 | |||
139 | C) Profile Sequence (profile_sequence) file: | ||
140 | |||
141 | The name of the file is profile_sequence_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
142 | |||
143 | The file is a vector containing the residue number and the local flucutation value. | ||
144 | |||
145 | The local fluctuation is calculated as the average fluctuation of the residues close to each other (that is the residues ranging from x-2 to x+2 with x = residue number). | ||
146 | |||
147 | Since the value contains the average distance fluctuation for a range of residues the output starts from residue 4 and ends at residue n-3). | ||
148 | |||
149 | |||
150 | D) Distance Analysis (distance_analysis_out) file: | ||
151 | |||
152 | The name of the file is distance_analysis_out_c_x_y_type.dat with c = cutoff value, x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
153 | |||
154 | This fail contains the distance fluctuation filtered by the cutoff value (the value are kept if the distance fluctuation is smaller than the cutoff value). | ||
155 | |||
156 | The cutoff value is calculated as the sum between the nearcutoff and the tolerance value specified by the user. | ||
157 | |||
158 | The nearcutoff can be calcolated either using sequence proximity or 3d proximity and can be specified by command line using the option -l or ~-~-local (see the command line help for further details). | ||
159 | |||
160 | In the case of sequence proximity it is calculated as the average of the distance fluctuation value for residues in the range of x-2,x+2 (x = residue index) | ||
161 | |||
162 | In the case of 3d proximity is calculated as the average distance fluctuation of the residues within a certain radius from the current one (default value = 6.5 A) | ||
163 | |||
164 | |||
165 | E) Blocks Averaging: (rmsdist_out_b) | ||
166 | |||
167 | It is possible to obtain a distance fluctuation matrix average on protein domains ( or blocks). | ||
168 | |||
169 | The script requires an input file with the borders of the protein blocks defined by the user, in the form of two columns: the first with the lower limit and the second with the upper limit. | ||
170 | |||
171 | |||
172 | = References = | ||
173 | |||
174 | Morra, G.; Potestio, R.; Micheletti, C.; Colombo, G., Corresponding Functional Dynamics across the Hsp90 Chaperone Family: Insights from a Multiscale Analysis of MD Simulations, PLOS Computational Biology 8(3): e1002433. [[https:~~/~~/doi.org/10.1371/journal.pcbi.1002433>>url:https://doi.org/10.1371/journal.pcbi.1002433]] |