Changes for page Announcements

Last modified by hbpadmin on 2025/01/09 12:12

From 172.1 to 171.1 From 198.1 to 197.1

From version 197.1

edited by mmorgan
on 2023/04/06 09:54

Change comment: There is no comment for this version

To version 172.1

edited by mmorgan
on 2023/01/26 20:58

Change comment: There is no comment for this version

Raw
Rendered

Summary

Page properties (1 modified, 0 added, 0 removed)

Details

Page properties

Content

@@ -1,87 +1,12 @@
  (% class="wikigeneratedid" %)
--=== **Collaboratory Dive and Lab work (2023-04-09)** ===
++=== **Collaboratory Office licensing issue (2023-01-24)** ===
--We are announcing a down time for the Collaboratory Drive this coming Sunday April 9 from 8 AM CET lasting potentially all day. This will allow us to activate a secondary Drive server, mirror copy of the primary server, which we intend to use for backups only.
--
--During the coming days, we will also be tentatively deploying and exercising new policies in the Drive.
--
--1. We will limit or disable the creation of [[core dumps>>https://en.wikipedia.org/wiki/Core_dump]] in the Drive by the Lab. An important number of large core dumps have recently been filling up the Drive, which has caused the disk full situation earlier this week much faster than would have happened otherwise.
--1. We will limit the maximum file size that can be generated by the Lab, both in the Drive and in the local file system of the user's Lab container. This file size will be aligned with the maximum upload size limit to the Drive which is currently 1 GB. The reasons for this are:
--11. In the Drive: the alignment of the policy of upload of files and Lab creation of files. The Drive is not intended for very large files. Collabs have a Bucket storage which is optimized for this purpose.
--11. In the local file storage of Lab containers: each Lab container is only accessible to a single user, but the storage resource is shared at the OpenShift level by all the pods running on a same worker node. A Lab container therefore has the possibility of causing a disk full condition on the other users' containers. The file size limit will reduce this risk. Users are reminded that stress testing any resources provided on the EBRAINS RI is strictly forbidden without the explicit authorization from EBRAINS Technical Coordination.
--1. We will "move" (copy and delete) files larger than the 1 GB limit from the Drive to the Bucket of the same collab. Files that are "moved" will be deleted from the Drive and a file with the same name extended with "_moved.txt" will be put in its place. This file will inform users to check the Bucket for their file. Example: the file **//MyTalk.mp4//** will be deleted and replaced by a file //**MyTalk.mp4_moved.txt**.//
--
--The new policies are not technically releases but they will be announced in The Collaboratory [[Releases>>doc:Collabs.the-collaboratory.Releases.WebHome]] page.
--
--=== **Collaboratory Drive issue (2023-04-04) (Resolved)** ===
--
--Some users are still experiencing issues while trying to access the Drive. We are investigating the issue and will update this announcement as soon as we have more information.
--
--The issues on the drive have been resolved.
--
--We are still looking to recover files that were lost due to the disk full condition earlier this week. It appears that there were no files lost that were older than Thursday March 30. If you identify any such file, please contact EBRAINS Support.
--
--Update: We have looked into all our options for recovering lost data from the event earlier this week. We're sorry to have to announce that no more data will be recoverable. We are however going to reattempt a continuous backup strategy to limit the risk and potential extent of data loss.
--
--=== **Collaboratory Drive repair (2023-04-03) (Resolved)** ===
--
  (% class="wikigeneratedid" %)
--After resizing the Drive to address the issue mentioned in the previous announcement, we noticed that the drives of specific collabs have entered into an unstable state. Users may experience issues while trying to access these drives. We are currently repairing these issues and this page will be updated once fully repaired.
--
--(% class="wikigeneratedid" %)
--The Drives of problematic collabs have been repaired. The repair process has unfortunately not been able to recover a few files. We are still looking whether recovery is possible.
--
--=== **Collaboratory Drive has filled up (2023-04-03) (Resolved)** ===
--
--(% class="wikigeneratedid" %)
--We monitor the Drive usage to make sure that there is always enough space for users. It seems that a user has bypassed our limitations and created one or more extremely large files in the Drive filling up the Drive storage space. We are in the processing of adding space. We will also be looking to identify the user that filled up the Drive.
--
--(% class="wikigeneratedid" %)
--We have added additional disk space to the Drive. It seems the Drive was filled with a large number of core dumps generated over the past days. The reports we received suggested the Drive had enough spare space to run 2 more weeks. we are investigating that issue.
--
--(% class="wikigeneratedid" %)
--We will look into limiting the capability of generating core dumps in the Lab and automatically deleting core dumps after a limited time (e.g. a week). This will be discussed in the TC Weekly call on April 4 at 15:00 CET.
--
--=== **Openshift is down at CSCS (2023-03-23) (Resolved)** ===
--
--The Openshift service at CSCS has failed overnight. We are working to identify the issue and recover the service ASAP.
--
--The services affected are all those running on the OpenShift service at CSCS including:
--
--* Collaboratory Lab at CSCS (users can use the Lab running at JSC by visiting [[lab.de.ebrains.eu>>https://lab.de.ebrains.eu]]),
--* atlas viewers,
--* simulation services (NEST desktop, TVB, Brain Simulation Platform)
--
--The Openshift server has been restarted.
--
--=== **Collaboratory Lab at JSC is down (2023-03-08) (Resolved)** ===
--
--The Collaboratory Lab servers are currently down at JSC. The servers were initially down due to a power outage. However, power has been restored to JSC but the servers have not yet been restored. More information to be provided soon. In the meantime, Lab users still have access to the servers at CSCS.
--
--Update: The servers at JSC have been restored.
--
--=== **Maintenance to Openshift (2023-02-21) (Resolved)** ===
--
--Services that use the NFS filesystem on Openshift will be down today from 17:15 CET for up to 30 minutes. This includes the Collaboratory Lab.
--
--=== **Collaboratory Office changing Docker image (2023-01-31) (Resolved)** ===
--
--We are operating a change of the Docker image used for the Office service. A change of the image was operated last week when we had a license issue.
--
--=== **Collaboratory Office licensing issue (2023-01-24) (Resolved)** ===
--
--(% class="wikigeneratedid" %)
  The Collaboratory Office service uses the OnlyOffice software. The license file that we were using has expired. This greatly restricts the number of users that can use the service in edit mode simultaneously.
  (% class="wikigeneratedid" %)
  Our team has been trying to contact OnlyOffice to resolve the issue as quickly as possible. Responsiveness at OnlyOffice this week has been challenging.
--(% class="wikigeneratedid" %)
--In the meantime, users can download - edit - upload files but obviously this does not allow simultaneous edits by multiple users. Please check the history of files to make sure that you do not overwrite someone else's edits when uploading  your updated file.
--
--(% class="wikigeneratedid" %)
--**Resolved**: it seems the issue has been resolved
--
  === **Collaboratory Lab at JSC is not working (2022-12-13) (Resolved)** ===
  The JSC cloud infrastructure is experiencing issues which cause services running there to become unreliable. As such, the Lab instance at JSC is not usable. We are in contact with the JSC team and monitoring the situation. In the meantime, please select the CSCS site when starting a new Lab session.
@@ -88,7 +88,7 @@
  **Resolved**: This issue was resolved by rebooting the worker node that was causing the issue.
--=== **Collaboratory Bucket is currently full** **(2022-11-03)(Done)** ===
++=== **Colloboratory Bucket is currently full** **(2022-11-03)(Done)** ===
  The quota for the Collaboratory Bucket is currently full. We are in the process of increasing this limit and will restore service as soon as possible. In the meantime, you still have read access to files that are stored in collab Buckets.
  \\Update: Quota has been increased and full service has been restored.