Wiki source code of MEDIUM - Machine lEarning Drug dIscovery throUgh dynaMics
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | (% class="jumbotron" %) | ||
2 | ((( | ||
3 | (% class="container" %) | ||
4 | ((( | ||
5 | = MEDIUM - Machine lEarning Drug dIscovery throUgh dynaMics = | ||
6 | |||
7 | = = | ||
8 | |||
9 | Giorgio Colombo Group (UNIPV) | ||
10 | ))) | ||
11 | ))) | ||
12 | |||
13 | (% class="row" %) | ||
14 | ((( | ||
15 | (% class="col-xs-12 col-sm-8" %) | ||
16 | ((( | ||
17 | = Why this tool is useful? = | ||
18 | |||
19 | The prediction of the best ligand for a specific protein could be a huge challenge using the classical approaches like molecular docking and stabilisation energy calculations. | ||
20 | |||
21 | Here we report on a fast and solid workflow which starts from our DF-matrix method to analyse how the protein globally behaves in the presence of a ligand. Machine Learning (ML) trains a Convolutional Neural Network (CNN) model directly on the pixel images of DF: train is preformed using a known ligand and the different behaviour of the protein is evaluated in the presence and in absence of it. | ||
22 | |||
23 | [[image:image-20230111110707-1.png]] | ||
24 | |||
25 | With the so trained model further predictions can be performed using different ligands. | ||
26 | |||
27 | = How to use the script = | ||
28 | |||
29 | • __Requisites__ | ||
30 | |||
31 | - Python 3.0 (or newer version) | ||
32 | |||
33 | - Numpy | ||
34 | |||
35 | - Tensorflow | ||
36 | |||
37 | - Pandas | ||
38 | |||
39 | - Sklearn | ||
40 | |||
41 | - cv2 | ||
42 | |||
43 | - Matplotlib | ||
44 | |||
45 | • __Usage__ | ||
46 | |||
47 | - CNN-training-script.py constitutes the main code of the tool: here different models of CNN can be customized, by changing also activation function and classification mode. In its final part it operates also a test using unseen data and save the trained model as a .h5 file. | ||
48 | |||
49 | The first operation that is required by the user regards the very initial prepartion of the DF-images from the DF-matrices [see the following link for the DF preparation [[https:~~/~~/wiki.ebrains.eu/bin/view/Collabs/distance-fluctuations>>https://wiki.ebrains.eu/bin/view/Collabs/distance-fluctuations]]]. This can be done using the gnuplot.in file and the exectute-DF.sh file, which renames the .png accroding with the nanoseconds used to extract the image. | ||
50 | |||
51 | The images required for the training of the model has to be selected and classified by the user between the two states of interest and by using the random-selector files to divide them between test, trainig and validation datasets. Here we usually preformed a random separation between test (20%), train (64%) and validation (16%) sets using the last 200ns of the equilibrated dynamics. | ||
52 | |||
53 | |||
54 | - CNN-external-data-test.py is a script which aims to use the trained model (.h5 file) and test it on data belonging to different proteins from the ones used during the build of the model. | ||
55 | ))) | ||
56 | |||
57 | |||
58 | (% class="col-xs-12 col-sm-4" %) | ||
59 | ((( | ||
60 | {{box title="**Contents**"}} | ||
61 | {{toc/}} | ||
62 | {{/box}} | ||
63 | |||
64 | |||
65 | ))) | ||
66 | ))) |