Wiki source code of User documentation
                  Version 4.1 by alexisdurieux on 2022/01/27 14:42
              
      Hide last authors
| author | version | line-number | content | 
|---|---|---|---|
|  | 1.1 | 1 | == Introduction == | 
| 2 | |||
| 3 | The HPC Job Proxy (formerly known as supercomputing proxy or Unicore job proxy) provides a simplified mean for EBRAINS service providers to launch jobs on Fenix supercomputers on behalf of EBRAINS end-users. | ||
| 4 | |||
| 5 | The proxy offers a wrapper over the Unicore service which adds: | ||
| 6 | |||
| 7 | * Logging of the jobs run on behalf of the end-user | ||
| 8 | * Access to the stdout, stderr, and return status of the job for the end-user | ||
| 9 | * Verification of the end-user’s quotas before submitting jobs | ||
| 10 | * Update of the end-user’s quotas | ||
| 11 | |||
| 12 | In order to use the proxy, an EBRAINS service provider needs to: | ||
| 13 | |||
| 14 | * Get a project and service account at one or more Fenix sites | ||
| 15 | * Get an EBRAINS service account and an EBRAINS IAM OIDC client ID | ||
| 16 | * Map that Fenix service account to that EBRAINS service account | ||
| 17 | |||
| 18 | Why does each service provider need a Fenix service account? The reason is that the Principal Investigator who gets the Fenix service account is legally responsible for the jobs being run on the supercomputers and for not enabling the end-user to run unintended executables. | ||
| 19 | |||
|  | 3.1 | 20 | [[image:Collabs.ebrains-unicore-job-proxy.User documentation.WebHome@ebrains-job-proxy-Job sequence.png]] | 
|  | 1.1 | 21 | |
| 22 | == Use case == | ||
| 23 | |||
| 24 | 1. Bob wants to do use an EBRAINS Application that requires supercomputing. | ||
| 25 | 11. The Application authenticates Bob as an EBRAINS user. | ||
| 26 | 11. He provides input which determines the supercomputing job to be run. | ||
| 27 | 1. The Application authenticates itself using its EBRAINS IAM service account with the HPC Job Proxy, and sends it the following information: | ||
| 28 | 11. a vanilla Unicore job definition, | ||
| 29 | 11. the EBRAINS IAM username of the end user on whose behalf the job is to be executed, | ||
| 30 | 11. a maximum amount of resources (managed by the EBRAINS quota manager) to be consumed by the job, | ||
| 31 | 11. an optional callback URL for notifications regarding the job. | ||
| 32 | 1. The HPC Job Proxy queries the EBRAINS Quota Manager for Bob and the maximum resources indicated. | ||
| 33 | 11. The EBRAINS Quota Manager queries the EBRAINS IAM to identify the quota for the relevant resources requested. | ||
| 34 | 11. The EBRAINS Quota Manager checks its database for resources already consumed by Bob and determines whether Bob has enough quota to run the job. | ||
| 35 | 11. The EBRAINS Quota Manager returns a validation to the Application, possibly indicating Bob’s quota status for those resources. | ||
| 36 | 1. If Bob has enough quota, the HPC Job Proxy accepts the job request. | ||
| 37 | 11. It logs Bob’s request. | ||
| 38 | 11. It sends the job to the Unicore API using the same EBRAINS IAM token the Application identified itself with. This will submit the job to Unicore impersonating the Application, and ensuring the job submission is accounted towards the Application’s own Fenix Service Account. | ||
| 39 | 1. When the job finishes (completes or fails), Unicore sends a notification to the HPC Job Proxy’s callback endpoint. | ||
| 40 | 1. The HPC Job Proxy logs the job results, and pushes the actual cost of the job run for Bob to the EBRAINS Quota Manager. | ||
| 41 | 1. The HPC Job Proxy notifies the Application of the results of the job. | ||
| 42 | |||
| 43 | == Sample transaction diagram == | ||
| 44 | |||
| 45 | [[image:ebrains-job-proxy-Job sequence.png]] | ||
| 46 | |||
| 47 | |||
| 48 | == API of the HPC Job Proxy == | ||
| 49 | |||
| 50 | The API provides 3 endpoints described below. The Swagger documentation for the API is available here. | ||
| 51 | |||
| 52 | * Submit a job: **POST /api/jobs/** | ||
| 53 | |||
|  | 2.1 | 54 | To submit a job to the proxy, you need to provide the following information as part of the POST JSON body: | 
|  | 1.1 | 55 | |
| 56 | * **job_def** - JSON: The Unicore job definition. For more information, please visit the [[__Unicore documentation__>>url:https://sourceforge.net/p/unicore/wiki/Job_Description/]] | ||
| 57 | * **site** - string: The Fenix site on which to run the job. | ||
| 58 | * **user_info** - string: The Application’s access token user information. | ||
| 59 | |||
| 60 | Valid Fenix site values are: | ||
| 61 | |||
| 62 | |**Fenix partner**|**Valid site value** | ||
| 63 | |BSC|BSC-MareNostrum | ||
| 64 | |CEA|irene | ||
| 65 | |Cineca|CINECA-MARCONI | ||
| 66 | | “|CINECA-GALILEO | ||
| 67 | |CSCS|DAINT-CSCS | ||
| 68 | |JSC|FZJ_JURECA | ||
| 69 | | “|JUWELS | ||
| 70 | | “|((( | ||
| 71 | JURON | ||
| 72 | ))) | ||
| 73 | |||
| 74 | * Fetch a job's details: **GET /api/jobs/<job_id>** | ||
| 75 | |||
| 76 | The proxy will query Unicore on-the-fly for the job’s latest details. Fenix sites retain the information of past jobs for a set amount of days (30 days at CSCS). If the request is made after that delay, the information returned is that which has been stored in the HPC Job Proxy. The full information retrieved and stored is the following: | ||
| 77 | |||
| 78 | {{info}} | ||
| 79 | {{code language="json"}} | ||
| 80 | { | ||
| 81 | "id": "string", | ||
| 82 | "description": "string", | ||
| 83 | "dataset_id": "string", | ||
| 84 | "duration": 0, | ||
| 85 | "definition": {}, | ||
| 86 | "error": "string", | ||
| 87 | "status": "CREATED", | ||
| 88 | "pre_command_status": "CREATED", | ||
| 89 | "post_command_status": "CREATED", | ||
| 90 | "runtime": "string", | ||
| 91 | "created": "2021-04-28T09:56:51.731Z", | ||
| 92 | "updated": "2021-04-28T09:56:51.731Z" | ||
| 93 | } | ||
| 94 | {{/code}} | ||
| 95 | {{/info}} | ||
| 96 | |||
|  | 2.1 | 97 | * Fetch a file present in the job's execution directory: **GET /api/jobs/<job_id>/<filename>** | 
|  | 1.1 | 98 | |
|  | 2.1 | 99 | Out of convenience, you can fetch the files that are present in the job's execution directory. The output, error and exit code of a job are placed by Unicore in the job’s working directory. The filenames available are: stdout, stderr, UNICORE_SCRIPT_EXIT_CODE. If the job has pre commands or post commands, the following filenames are also available: stdout and stderr respectively under the folders .UNICORE_POST_0/ or .UNICORE_PRE_0/. If your job is creating files than they are too available to be fetched. | 
|  | 1.1 | 100 | |
| 101 | The proxy will respond with the //raw content// of the requested file. These files are available as long as the site does not delete them (30 days for CSCS PIZ DAINT). | ||
| 102 | |||
| 103 | For all 3 endpoints, you need to provide your service account access token in the //Authorization// header of your request (Bearer <access_token>). Your service account must be linked to a Fenix account, otherwise your job will be rejected (HTTP 502). | ||
| 104 | |||
| 105 | == Sample usage == | ||
| 106 | |||
|  | 4.1 | 107 | [[A Jupyter Notebook provides sample code for the Application’s access to the HPC Job Proxy.>>https://lab.ch.ebrains.eu/user-redirect/lab/tree/shared/EBRAINS%20HPC%20job%20proxy/HPC_job_proxy_usage.ipynb]] | 
|  | 1.1 | 108 | |
| 109 | == Source code == | ||
| 110 | |||
| 111 | The source code of the HPC Job Proxy is available on Gitlab. | ||
| 112 | |||
| 113 | == Setting up accesses for your application == | ||
| 114 | |||
| 115 | For your application to work, you need to set up the following accesses. | ||
| 116 | |||
| 117 | Your application needs an EBRAINS IAM service account. You should contact EBRAINS __support__ to create it. You will need to provide a username and an email address which is not linked to another EBRAINS account. The mapping between the EBRAINS service account and the Fenix service account is done automatically based on the linked email addresses which need to be identical. | ||
| 118 | |||
| 119 | Your application needs an EBRAINS IAM OIDC client. See the __instructions__. | ||
| 120 | |||
| 121 | Your application needs a Fenix project and a Fenix service account. See the __instructions__ to get a project. Then the Principal Investigator in your lab can request a Fenix service account via EBRAINS __support__. | ||
| 122 | |||
| 123 | Then in your app, you should retrieve the access token linked to your service account user. This requires the OIDC client ID, client Secret, EBRAINS service account's username and password. See the __instructions__. | ||
| 124 | |||
| 125 | When requesting a job on behalf of an end user, your app gives this access token in the field "**user_info**" of the payload alongside the job request. Note that the access token expires (typically after one week), so your application should generate a new token. See the __instructions__. | ||
| 126 | |||
| 127 | |||
| 128 | __Example__: | ||
| 129 | |||
| 130 | The EBRAINS Image Service uses the HPC Job Proxy. | ||
| 131 | |||
| 132 | * EBRAINS IAM OIDC Client: “img_svc” | ||
| 133 | * EBRAINS IAM service account: "ich019sa" | ||
| 134 | * Fenix project: “ich019” | ||
| 135 | * Fenix service account: “ich019sa” | ||
| 136 | * Email address of both service accounts: “platform.ich019sa@humanbrainproject.eu” | ||
| 137 | |||
| 138 | == List of services using the HPC Job Proxy == | ||
| 139 | |||
| 140 | * EBRAINS image service | ||
| 141 | * Ilastik (ongoing) | ||
| 142 | * TVB cloud (ongoing) |