Wiki source code of Distance-Fluctuations
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | (% class="jumbotron" %) | ||
2 | ((( | ||
3 | (% class="container" %) | ||
4 | ((( | ||
5 | = Distance Fluctuation (DF) Analysis = | ||
6 | |||
7 | Giorgio Colombo Group (UNIPV) | ||
8 | ))) | ||
9 | ))) | ||
10 | |||
11 | (% class="row" %) | ||
12 | ((( | ||
13 | (% class="col-xs-12 col-sm-8" %) | ||
14 | ((( | ||
15 | = What DF matrices are? = | ||
16 | |||
17 | The analysis of the results of a MD simulation can performed using the Distance Fluctuation matrices (DF), based on the Coordination Propensity hypothesis: | ||
18 | |||
19 | (% style="text-align:center" %) | ||
20 | [[image:CodeCogsEqn-58.png||height="34" width="181"]] | ||
21 | |||
22 | low CP values, corresponding to low pair-distance fluctuations, highlight groups of residues that move in a mechanically coordinated way. | ||
23 | |||
24 | |||
25 | = How to use the script = | ||
26 | |||
27 | • __Requisites__ | ||
28 | |||
29 | - Python 3.0 (or newer version) | ||
30 | |||
31 | - Numpy | ||
32 | |||
33 | - Scipy | ||
34 | |||
35 | |||
36 | • __Usage__ | ||
37 | |||
38 | The script can analyze a MD trajectory and identify the coordinated motions between residues. It can then filter the output matrix based on the distance to identify long-range coordinated motions. | ||
39 | |||
40 | The script can work both using only C-alphas (using either a pdb or a xyz file) or the sidechains (using a pdb file). | ||
41 | |||
42 | For more information run: | ||
43 | |||
44 | {{{ python3 distance_fluctuation.py -h}}} | ||
45 | |||
46 | |||
47 | ))) | ||
48 | |||
49 | |||
50 | (% class="col-xs-12 col-sm-4" %) | ||
51 | ((( | ||
52 | {{box title="**Contents**"}} | ||
53 | {{toc/}} | ||
54 | |||
55 | |||
56 | {{/box}} | ||
57 | |||
58 | |||
59 | ))) | ||
60 | ))) | ||
61 | |||
62 | • __How to read the output__ | ||
63 | |||
64 | The script generate different output file. | ||
65 | |||
66 | |||
67 | A) Average Distance (avgdist) file: | ||
68 | |||
69 | The name of the file is avgdist_out_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains) | ||
70 | |||
71 | The file contains a matrix using the residue indexes as axes and the average value of the distance between the residues as the data (r1 r2 avgdist). | ||
72 | |||
73 | The distance is calculated as the average of the euclidean distance between the residues. | ||
74 | |||
75 | |||
76 | B) Distance Fluctuation (rmsdist) file: | ||
77 | |||
78 | The name of the file is rmsdist_out_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
79 | |||
80 | The file contains a matrix using the residue indexes as axes and the distance fluctuation between the residues as the data (r1 r2 rmsdist). | ||
81 | |||
82 | The distance fluctuation is calculated for the residues that are at least 3 residues away from each other (x-2 to x+2) as follow: | ||
83 | |||
84 | 1) Calculate the average euclidean distance between the residues (either CA or center of mass) | ||
85 | |||
86 | 2) Calculate the average distance vector | ||
87 | |||
88 | 3) Substract the distance fluctuation to the average distance | ||
89 | |||
90 | 4) Calculate the power of the difference between distance fluctuation and local fluctuation | ||
91 | |||
92 | 5) Filter the values of the close residues (1-x, x, 1+x) | ||
93 | |||
94 | 6) Divide the obtained value for the number of frames | ||
95 | |||
96 | |||
97 | C) Profile Sequence (profile_sequence) file: | ||
98 | |||
99 | The name of the file is profile_sequence_x_y_type.dat with x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
100 | |||
101 | The file is a vector containing the residue number and the local flucutation value. | ||
102 | |||
103 | The local fluctuation is calculated as the average fluctuation of the residues close to each other (that is the residues ranging from x-2 to x+2 with x = residue number). | ||
104 | |||
105 | Since the value contains the average distance fluctuation for a range of residues the output starts from residue 4 and ends at residue n-3). | ||
106 | |||
107 | |||
108 | D) Distance Analysis (distance_analysis_out) file: | ||
109 | |||
110 | The name of the file is distance_analysis_out_c_x_y_type.dat with c = cutoff value, x = start frame of the analysis, y = end frame of the analysis, type = type of analysis (CA or sidechains). | ||
111 | |||
112 | This fail contains the distance fluctuation filtered by the cutoff value (the value are kept if the distance fluctuation is smaller than the cutoff value). | ||
113 | |||
114 | The cutoff value is calculated as the sum between the nearcutoff and the tolerance value specified by the user. | ||
115 | |||
116 | The nearcutoff can be calcolated either using sequence proximity or 3d proximity and can be specified by command line using the option -l or ~-~-local (see the command line help for further details). | ||
117 | |||
118 | In the case of sequence proximity it is calculated as the average of the distance fluctuation value for residues in the range of x-2,x+2 (x = residue index) | ||
119 | |||
120 | In the case of 3d proximity is calculated as the average distance fluctuation of the residues within a certain radius from the current one (default value = 6.5 A) | ||
121 | |||
122 | |||
123 | E) Blocks Averaging: (rmsdist_out_b) | ||
124 | |||
125 | It is possible to obtain a distance fluctuation matrix average on protein domains ( or blocks). | ||
126 | |||
127 | The script requires an input file with the borders of the protein blocks defined by the user, in the form of two columns: the first with the lower limit and the second with the upper limit. |