Attention: The EBRAINS drive will be unavailable for most of the weekend starting the 25th October. Although the Lab is availble while the Drive is down, files that are stored in the Drive will not be loaded and you will be unable to save documents directly on the Lab.


Wiki source code of User documentation

Version 2.1 by evareill on 2021/04/28 16:48

Hide last authors
evareill 1.1 1 == Introduction ==
2
3 The HPC Job Proxy (formerly known as supercomputing proxy or Unicore job proxy) provides a simplified mean for EBRAINS service providers to launch jobs on Fenix supercomputers on behalf of EBRAINS end-users.
4
5 The proxy offers a wrapper over the Unicore service which adds:
6
7 * Logging of the jobs run on behalf of the end-user
8 * Access to the stdout, stderr, and return status of the job for the end-user
9 * Verification of the end-user’s quotas before submitting jobs
10 * Update of the end-user’s quotas
11
12 In order to use the proxy, an EBRAINS service provider needs to:
13
14 * Get a project and service account at one or more Fenix sites
15 * Get an EBRAINS service account and an EBRAINS IAM OIDC client ID
16 * Map that Fenix service account to that EBRAINS service account
17
18 Why does each service provider need a Fenix service account? The reason is that the Principal Investigator who gets the Fenix service account is legally responsible for the jobs being run on the supercomputers and for not enabling the end-user to run unintended executables.
19
20 [[image:HPC Job Proxy diagram.jpg]]
21
22 == Use case ==
23
24 1. Bob wants to do use an EBRAINS Application that requires supercomputing.
25 11. The Application authenticates Bob as an EBRAINS user.
26 11. He provides input which determines the supercomputing job to be run.
27 1. The Application authenticates itself using its EBRAINS IAM service account with the HPC Job Proxy, and sends it the following information:
28 11. a vanilla Unicore job definition,
29 11. the EBRAINS IAM username of the end user on whose behalf the job is to be executed,
30 11. a maximum amount of resources (managed by the EBRAINS quota manager) to be consumed by the job,
31 11. an optional callback URL for notifications regarding the job.
32 1. The HPC Job Proxy queries the EBRAINS Quota Manager for Bob and the maximum resources indicated.
33 11. The EBRAINS Quota Manager queries the EBRAINS IAM to identify the quota for the relevant resources requested.
34 11. The EBRAINS Quota Manager checks its database for resources already consumed by Bob and determines whether Bob has enough quota to run the job.
35 11. The EBRAINS Quota Manager returns a validation to the Application, possibly indicating Bob’s quota status for those resources.
36 1. If Bob has enough quota, the HPC Job Proxy accepts the job request.
37 11. It logs Bob’s request.
38 11. It sends the job to the Unicore API using the same EBRAINS IAM token the Application identified itself with. This will submit the job to Unicore impersonating the Application, and ensuring the job submission is accounted towards the Application’s own Fenix Service Account.
39 1. When the job finishes (completes or fails), Unicore sends a notification to the HPC Job Proxy’s callback endpoint.
40 1. The HPC Job Proxy logs the job results, and pushes the actual cost of the job run for Bob to the EBRAINS Quota Manager.
41 1. The HPC Job Proxy notifies the Application of the results of the job.
42
43 == Sample transaction diagram ==
44
45 [[image:ebrains-job-proxy-Job sequence.png]]
46
47
48 == API of the HPC Job Proxy ==
49
50 The API provides 3 endpoints described below. The Swagger documentation for the API is available here.
51
52 * Submit a job: **POST /api/jobs/**
53
evareill 2.1 54 To submit a job to the proxy, you need to provide the following information as part of the POST JSON body:
evareill 1.1 55
56 * **job_def** - JSON: The Unicore job definition. For more information, please visit the [[__Unicore documentation__>>url:https://sourceforge.net/p/unicore/wiki/Job_Description/]]
57 * **site** - string: The Fenix site on which to run the job.
58 * **user_info** - string: The Application’s access token user information.
59
60 Valid Fenix site values are:
61
62 |**Fenix partner**|**Valid site value**
63 |BSC|BSC-MareNostrum
64 |CEA|irene
65 |Cineca|CINECA-MARCONI
66 | “|CINECA-GALILEO
67 |CSCS|DAINT-CSCS
68 |JSC|FZJ_JURECA
69 | “|JUWELS
70 | “|(((
71 JURON
72 )))
73
74 * Fetch a job's details: **GET /api/jobs/<job_id>**
75
76 The proxy will query Unicore on-the-fly for the job’s latest details. Fenix sites retain the information of past jobs for a set amount of days (30 days at CSCS). If the request is made after that delay, the information returned is that which has been stored in the HPC Job Proxy. The full information retrieved and stored is the following:
77
78 {{info}}
79 {{code language="json"}}
80 {
81 "id": "string",
82 "description": "string",
83 "dataset_id": "string",
84 "duration": 0,
85 "definition": {},
86 "error": "string",
87 "status": "CREATED",
88 "pre_command_status": "CREATED",
89 "post_command_status": "CREATED",
90 "runtime": "string",
91 "created": "2021-04-28T09:56:51.731Z",
92 "updated": "2021-04-28T09:56:51.731Z"
93 }
94 {{/code}}
95 {{/info}}
96
evareill 2.1 97 * Fetch a file present in the job's execution directory: **GET /api/jobs/<job_id>/<filename>**
evareill 1.1 98
evareill 2.1 99 ​​​​​​​Out of convenience, you can fetch the files that are present in the job's execution directory. The output, error and exit code of a job are placed by Unicore in the job’s working directory. The filenames available are: stdout, stderr, UNICORE_SCRIPT_EXIT_CODE. If the job has pre commands or post commands, the following filenames are also available: stdout and stderr respectively under the folders .UNICORE_POST_0/ or .UNICORE_PRE_0/. If your job is creating files than they are too available to be fetched.
evareill 1.1 100
101 The proxy will respond with the //raw content// of the requested file. These files are available as long as the site does not delete them (30 days for CSCS PIZ DAINT).
102
103 For all 3 endpoints, you need to provide your service account access token in the //Authorization// header of your request (Bearer <access_token>). Your service account must be linked to a Fenix account, otherwise your job will be rejected (HTTP 502).
104
105 == Sample usage ==
106
107 A Jupyter Notebook provides sample code for the Application’s access to the HPC Job Proxy.
108
109 == Source code ==
110
111 The source code of the HPC Job Proxy is available on Gitlab.
112
113 == Setting up accesses for your application ==
114
115 For your application to work, you need to set up the following accesses.
116
117 Your application needs an EBRAINS IAM service account. You should contact EBRAINS __support__ to create it. You will need to provide a username and an email address which is not linked to another EBRAINS account. The mapping between the EBRAINS service account and the Fenix service account is done automatically based on the linked email addresses which need to be identical.
118
119 Your application needs an EBRAINS IAM OIDC client. See the __instructions__.
120
121 Your application needs a Fenix project and a Fenix service account. See the __instructions__ to get a project. Then the Principal Investigator in your lab can request a Fenix service account via EBRAINS __support__.
122
123 Then in your app, you should retrieve the access token linked to your service account user. This requires the OIDC client ID, client Secret, EBRAINS service account's username and password. See the __instructions__.
124
125 When requesting a job on behalf of an end user, your app gives this access token in the field "**user_info**" of the payload alongside the job request. Note that the access token expires (typically after one week), so your application should generate a new token. See the __instructions__.
126
127
128 __Example__:
129
130 The EBRAINS Image Service uses the HPC Job Proxy.
131
132 * EBRAINS IAM OIDC Client: “img_svc”
133 * EBRAINS IAM service account: "ich019sa"
134 * Fenix project: “ich019”
135 * Fenix service account: “ich019sa”
136 * Email address of both service accounts: “platform.ich019sa@humanbrainproject.eu”
137
138 == List of services using the HPC Job Proxy ==
139
140 * EBRAINS image service
141 * Ilastik (ongoing)
142 * TVB cloud (ongoing)