Jump to content


Photo

S4 temporary outage in 11:40PM ET 10/13


  • Please log in to reply
7 replies to this topic

#1 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 13 October 2018 - 10:34 PM

We will be having a short one to three minute outage of s4 to complete a live migration move.  


  • 0

#2 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 13 October 2018 - 10:43 PM

This stage is complete.  We will have another short outage to revert the move later this evening.


  • 0

#3 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 13 October 2018 - 11:17 PM

We will be doing the second outage of the night at 12:20 PM


  • 0

#4 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 13 October 2018 - 11:22 PM

The work is complete for the evening.


  • 0

#5 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 14 October 2018 - 12:19 AM

I am taking S4 down immediatly to resolve a issue that is occuring as a result of the physical server.  


  • 0

#6 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 14 October 2018 - 12:24 AM

S4 is booted.  YOur sites should be back online shortly.


  • 0

#7 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,883 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 14 October 2018 - 12:52 AM

As the S4 server has been migrated to different hardware that isn't experiencing issues - we do not anticipate further problems.  At this time the hardware that S4 was on previously is no longer providing services to any clients and we're working with Dell to have the hardware replaced under warranty.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#8 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,883 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 14 October 2018 - 04:39 AM

A few clients have reached out asking why the system did not automatically fail-over the S4 server to another piece of hardware and why we had to manually intervene.
 
I'm going to provide the same answer here as I did in the tickets so that we're being clear and transparent.
 
 

The failure of the host wasn't total. The server was online but degraded. The long and short of it is that the motherboard in the host system that failed is failing in an interesting way - in that it's corrupting data read from 3 out of 24 memory modules in the server - 3 out of 12 attached to CPU1. Not all VMs on the host were affected - the host was not entirely down. S4 was most affected as the RAM that was failing was being used by that server.  We are using error-correcting code memory [ECC Memory] that can detect memory errors and correct them automatically but we have verified the RAM modules themselves are not the cause of the issues.

If the hardware were to fail as to take the services 100% offline they would have come back up on another machine automatically. As the host server was online the automatic system did not migrate the guest servers. We intervened manually as we monitor the servers closely. We were aware of the issue within less than 60 seconds and working on it.

As not to cause file system corruption / data loss / etc we elected to gracefully bring the service down to bring it back up on another machine which took more time than forcefully killing the machine and bringing it back online. More time, but less risk.  While we could always forcefully kill the guests and bring them back online on another machine faster - killing a server does risk data damage and corruption particularly to MySQL data so we do try to avoid that whenever possible.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users