Changes for page Announcements

Last modified by hbpadmin on 2025/01/09 12:12

From version 129.1
edited by mmorgan
on 2022/08/02 10:53
Change comment: There is no comment for this version
To version 142.1
edited by hbpadmin
on 2022/10/17 15:27
Change comment: There is no comment for this version

Summary

Details

Page properties
Author
... ... @@ -1,1 +1,1 @@
1 -XWiki.mmorgan
1 +XWiki.hbpadmin
Content
... ... @@ -1,18 +1,45 @@
1 -=== (% style="color:#95a5a6" %)//**Storage and Cloud service maintenance (2022-08-03)**//(%%) ===
1 +(% class="wikigeneratedid" %)
2 +=== **Drive is currently down.** ===
2 2  
3 -(% style="color:#95a5a6" %)//A maintenance operation at CSCS requires that HBP/EBRAINS services be stopped Wednesday August 3 morning. //(% style="color:#1abc9c" %)//Pending final confirmation on Tuesday August 2.//
4 +The Drive is currently down and not useable. This is due to infrastructure issues and service will be restored as soon as possible.
4 4  
5 -(% style="color:#95a5a6" %)//**__Timeline__**: all times CEST//
6 +=== **Unable to save files to Drive (Fixed)** ===
6 6  
7 -* (% style="color:#95a5a6" %)//**08:00**: Service providers shutdown services running on OpenStack or OpenShift at CSCS//
8 -* (% style="color:#95a5a6" %)//**08:30**: Maintenance start by CSCS team//
9 -* (% style="color:#95a5a6" %)//**12:00**: Planned maintenance end by CSCS team. Service providers check that services have come back online. //(% style="color:#1abc9c" %)//Check this page for updates//
8 +The file system that the Drive runs on is currently full. Unfortunately, our detection of the file system being nearly full did not work. As such users cannot upload files nor save any changes to their files in the Drive. Due to another, unrelated issue, it is not possible for us to simply expand the file system. For this reason, we are currently in the process of moving the Drive data to a bigger volume. This is causing the delay. We will update this page as soon as we are finished with the move.
9 +\\Please note that as most services run off the Drive, this also affects the Lab, although you can run Notebooks and even modify notebooks, any changes you make will not be saved.
10 10  
11 -(% style="color:#95a5a6" %)//The storage back-end used by HBP/EBRAINS services has been causing some issues which have had repercussions on access to the object storage and OpenStack cloud service and thereby on HBP/EBRAINS services which run on this infrastructure. The issue has been identified and CSCS is ready to deploy a patch on the storage back-end. This will require that services running on OpenStack at CSCS be stopped for the duration of the maintenance.//
11 +NOTE: We transferred everything from the affected volume to a larger volume, the service is now useable again.
12 12  
13 -(% style="color:#95a5a6" %)//There is never a good time for maintenance. We’re heading into a few weeks when more users will be on vacation, and some of the service providers may also be away. Hopefully this will impact as few people as possible. We apologize in advance for any inconvenience the downtime may cause.//
13 +=== **No uploads allowed to Data-proxy (Fixed)** ===
14 14  
15 -(% class="wikigeneratedid" %)
15 +At the moment uploads are not permitted due to the data-proxy exceeding its quota allowance. We are working to solve this as soon as possible. 
16 +\\**Fixed: **Quota was increased, files can no be uploaded again.
17 +
18 +=== **Collaboratory Drive maintenance (2022-08-19) (Completed)** ===
19 +
20 +The Drive was meant to be taken down for routine maintenance to increase the available space available for Drive storage this afternoon. That operation has had to be rescheduled due to technical issues on the storage infrastructure.
21 +
22 +=== **Intermittent issues with the Bucket (data-proxy) (Solved)** ===
23 +
24 +As reported by the main banner, there had been intermittent issues with the Bucket occasionally going down for a short amount of time. This has been resolved by the maintenance performed at CSCS on August 10th. If you encounter any further issues related to the Bucket, please open a ticket to support.
25 +
26 +=== **Storage and Cloud service maintenance (2022-08-10) (Completed)** ===
27 +
28 +This maintenance was shifted from August 3 to August 10 due to an incident on another server managed by the ETHZ central IT services.
29 +
30 +A maintenance operation at CSCS requires that some HBP/EBRAINS services be stopped Wednesday August 3 morning. The services affected are those using NFS storage volumes on the Castor cloud service. EBRAINS service providers that migrate their VMs to CEPH storage ahead of that date can keep their services running during the maintenance.
31 +
32 +**__Timeline__**: all times CEST
33 +
34 +* **08:00**: Service providers shutdown services running on OpenStack or OpenShift at CSCS
35 +* **08:30**: Maintenance start by CSCS team
36 +* **12:00**: Planned maintenance end by CSCS team. Service providers check that services have come back online.(% style="color:#95a5a6" %) (% style="color:#1abc9c" %)Check this page for updates
37 +* (% style="color:#1abc9c" %)15:20: Maintenance ended at 15:20 CEST time.
38 +
39 +The storage back-end used by HBP/EBRAINS services has been causing some issues which have had repercussions on access to the object storage and OpenStack cloud service and thereby on HBP/EBRAINS services which run on this infrastructure. The issue has been identified and CSCS is ready to deploy a patch on the storage back-end. This will require that services running on OpenStack at CSCS be stopped for the duration of the maintenance.
40 +
41 +There is never a good time for maintenance. We’re heading into a few weeks when more users will be on vacation, and some of the service providers may also be away. Hopefully this will impact as few people as possible. We apologize in advance for any inconvenience the downtime may cause.
42 +
16 16  === **Infrastructure issues at CSCS (2022-08-01)** ===
17 17  
18 18  The infrastructure at CSCS on which EBRAINS services run has failed over the weekend. August 1 was a bank holiday in Switzerland where CSCS is located. The situation was recovered before 10:00 CEST on Tuesday August 2.