Wiki source code of User documentation

Version 1.1 by evareill on 2021/04/28 11:57

Show last authors
1 == Introduction ==
2
3 The HPC Job Proxy (formerly known as supercomputing proxy or Unicore job proxy) provides a simplified mean for EBRAINS service providers to launch jobs on Fenix supercomputers on behalf of EBRAINS end-users.
4
5 The proxy offers a wrapper over the Unicore service which adds:
6
7 * Logging of the jobs run on behalf of the end-user
8 * Access to the stdout, stderr, and return status of the job for the end-user
9 * Verification of the end-user’s quotas before submitting jobs
10 * Update of the end-user’s quotas
11
12 In order to use the proxy, an EBRAINS service provider needs to:
13
14 * Get a project and service account at one or more Fenix sites
15 * Get an EBRAINS service account and an EBRAINS IAM OIDC client ID
16 * Map that Fenix service account to that EBRAINS service account
17
18 Why does each service provider need a Fenix service account? The reason is that the Principal Investigator who gets the Fenix service account is legally responsible for the jobs being run on the supercomputers and for not enabling the end-user to run unintended executables.
19
20 [[image:HPC Job Proxy diagram.jpg]]
21
22 == Use case ==
23
24 1. Bob wants to do use an EBRAINS Application that requires supercomputing.
25 11. The Application authenticates Bob as an EBRAINS user.
26 11. He provides input which determines the supercomputing job to be run.
27 1. The Application authenticates itself using its EBRAINS IAM service account with the HPC Job Proxy, and sends it the following information:
28 11. a vanilla Unicore job definition,
29 11. the EBRAINS IAM username of the end user on whose behalf the job is to be executed,
30 11. a maximum amount of resources (managed by the EBRAINS quota manager) to be consumed by the job,
31 11. an optional callback URL for notifications regarding the job.
32 1. The HPC Job Proxy queries the EBRAINS Quota Manager for Bob and the maximum resources indicated.
33 11. The EBRAINS Quota Manager queries the EBRAINS IAM to identify the quota for the relevant resources requested.
34 11. The EBRAINS Quota Manager checks its database for resources already consumed by Bob and determines whether Bob has enough quota to run the job.
35 11. The EBRAINS Quota Manager returns a validation to the Application, possibly indicating Bob’s quota status for those resources.
36 1. If Bob has enough quota, the HPC Job Proxy accepts the job request.
37 11. It logs Bob’s request.
38 11. It sends the job to the Unicore API using the same EBRAINS IAM token the Application identified itself with. This will submit the job to Unicore impersonating the Application, and ensuring the job submission is accounted towards the Application’s own Fenix Service Account.
39 1. When the job finishes (completes or fails), Unicore sends a notification to the HPC Job Proxy’s callback endpoint.
40 1. The HPC Job Proxy logs the job results, and pushes the actual cost of the job run for Bob to the EBRAINS Quota Manager.
41 1. The HPC Job Proxy notifies the Application of the results of the job.
42
43
44 == Sample transaction diagram ==
45
46 [[image:ebrains-job-proxy-Job sequence.png]]
47
48
49 == API of the HPC Job Proxy ==
50
51 The API provides 3 endpoints described below. The Swagger documentation for the API is available here.
52
53 * Submit a job: **POST /api/jobs/**
54
55 ​​​​​​​To submit a job to the proxy, you need to provide the following information as part of the POST JSON body:
56
57 * **job_def** - JSON: The Unicore job definition. For more information, please visit the [[__Unicore documentation__>>url:https://sourceforge.net/p/unicore/wiki/Job_Description/]]
58 * **site** - string: The Fenix site on which to run the job.
59 * **user_info** - string: The Application’s access token user information.
60
61 Valid Fenix site values are:
62
63 |**Fenix partner**|**Valid site value**
64 |BSC|BSC-MareNostrum
65 |CEA|irene
66 |Cineca|CINECA-MARCONI
67 | “|CINECA-GALILEO
68 |CSCS|DAINT-CSCS
69 |JSC|FZJ_JURECA
70 | “|JUWELS
71 | “|(((
72 JURON
73 )))
74
75
76 * Fetch a job's details: **GET /api/jobs/<job_id>**
77
78 The proxy will query Unicore on-the-fly for the job’s latest details. Fenix sites retain the information of past jobs for a set amount of days (30 days at CSCS). If the request is made after that delay, the information returned is that which has been stored in the HPC Job Proxy. The full information retrieved and stored is the following:
79
80 {{info}}
81 {{code language="json"}}
82 {
83 "id": "string",
84 "description": "string",
85 "dataset_id": "string",
86 "duration": 0,
87 "definition": {},
88 "error": "string",
89 "status": "CREATED",
90 "pre_command_status": "CREATED",
91 "post_command_status": "CREATED",
92 "runtime": "string",
93 "created": "2021-04-28T09:56:51.731Z",
94 "updated": "2021-04-28T09:56:51.731Z"
95 }
96 {{/code}}
97 {{/info}}
98
99 * Fetch a job's file: **GET /api/jobs/<job_id>/<filename>**
100
101 ​​​​​​​Out of convenience, you can fetch the output of a job (placed by Unicore in the job’s working directory). The filenames available are: stdout, stderr, UNICORE_SCRIPT_EXIT_CODE. If the job has pre commands or post commands, the following filenames are also available: stdout and stderr respectively under the folders .UNICORE_POST_0/ or .UNICORE_PRE_0/
102
103 The proxy will respond with the //raw content// of the requested file. These files are available as long as the site does not delete them (30 days for CSCS PIZ DAINT).
104
105 For all 3 endpoints, you need to provide your service account access token in the //Authorization// header of your request (Bearer <access_token>). Your service account must be linked to a Fenix account, otherwise your job will be rejected (HTTP 502).
106
107 == Sample usage ==
108
109 A Jupyter Notebook provides sample code for the Application’s access to the HPC Job Proxy.
110
111 == Source code ==
112
113 The source code of the HPC Job Proxy is available on Gitlab.
114
115 == Setting up accesses for your application ==
116
117 For your application to work, you need to set up the following accesses.
118
119 Your application needs an EBRAINS IAM service account. You should contact EBRAINS __support__ to create it. You will need to provide a username and an email address which is not linked to another EBRAINS account. The mapping between the EBRAINS service account and the Fenix service account is done automatically based on the linked email addresses which need to be identical.
120
121 Your application needs an EBRAINS IAM OIDC client. See the __instructions__.
122
123 Your application needs a Fenix project and a Fenix service account. See the __instructions__ to get a project. Then the Principal Investigator in your lab can request a Fenix service account via EBRAINS __support__.
124
125 Then in your app, you should retrieve the access token linked to your service account user. This requires the OIDC client ID, client Secret, EBRAINS service account's username and password. See the __instructions__.
126
127 When requesting a job on behalf of an end user, your app gives this access token in the field "**user_info**" of the payload alongside the job request. Note that the access token expires (typically after one week), so your application should generate a new token. See the __instructions__.
128
129
130 __Example__:
131
132 The EBRAINS Image Service uses the HPC Job Proxy.
133
134 * EBRAINS IAM OIDC Client: “img_svc”
135 * EBRAINS IAM service account: "ich019sa"
136 * Fenix project: “ich019”
137 * Fenix service account: “ich019sa”
138 * Email address of both service accounts: “platform.ich019sa@humanbrainproject.eu”
139
140 == List of services using the HPC Job Proxy ==
141
142 * EBRAINS image service
143 * Ilastik (ongoing)
144 * TVB cloud (ongoing)