Jump to content


ericr

Member Since 12 Apr 2014
Offline Last Active May 21 2019 02:00 PM

Topics I've Started

Work to address MBS / zombieload

17 May 2019 - 03:17 PM

Summary: all cloud servers will be restarted this weekend after 10pm eastern.

We are preparing to do restarts of all levels to address MBS / zombieload

Due to the nature of the exploit there are fixes required in the cpu/bios, physical server kernel, qemu, and the cloud servers kernel.

We are still waiting on Dell for the cpu/bios update. However we do have everything in place to do the work on the cloud servers restarts.

These restarts may take up to 15 minutes.

Server reboots to address kernel bug

09 May 2019 - 03:20 PM

CLoudlinux has confirmed a kernel bug that requires a reboot of the servers to resolve that is causing unexpected high load and downtimes.  We will be doing the restarts tonight after 10PM EST


Cloud hardware restarts

26 April 2019 - 05:13 PM

We are going to be doing restarts of the underlying cloud hardware to address a possible bug that has caused downtime recently.

 

During this we expect no downtime as the actual hosting servers will be migrated to other cloud hardware during the restart. However, storage speed and site load times may slow down for short periods of time both from the restart and the live migration.

 

We will be doing the first restart at 8PM EST which is a storage only node currently to prep for the larger reboots.  doing this will provide us the ability to provide a eta for the main restarts.


Network Outage 1/17/2019 - 5:56 AM to 6:54 AM

17 January 2019 - 07:59 AM

We received notifications at 5:56 AM that everything had gone offline.  We reached out to our facility and they were already aware and working to resolve the issue.

 

For clarity this was an issue upstream from us - our network and servers were online and operational but lost connectivity to the internet when the facility as a whole experienced a disruption.

 

The network came back online after 58 minutes and we are awaiting a Root Cause Analysis [RCA] from our facility regarding the network outage.  As soon as we have the cause of the outage we will make it available here.

 

This thread will be updated when we know more.


S4 temporary outage in 11:40PM ET 10/13

13 October 2018 - 10:34 PM

We will be having a short one to three minute outage of s4 to complete a live migration move.