Jump to content


Photo

Major Outage - 09/21/18 - 09/24/2018


  • This topic is locked This topic is locked
97 replies to this topic

#81 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 25 September 2018 - 09:16 PM

For those worried that something like this could happen again.

 

We have already enabled snapshots on our storage cluster.  We're doing one snapshot every hour and keeping them for 10 hours.

 

So from a hypothetical standpoint - let's say that this did manage to happen again.  We would simply mount a snapshot from before the incident - within the hour before - and boot everything back up.

 

Total downtime would be - ~5 minutes - for the whole network.  Would there be any data loss? Possibly anything written in the preceding hour or less - but nothing compared to the losses of a multi-day outage.

 

It would look literally like we just shut everything down and booted it back up.  No 'restorations', no lost emails, nothing.  There's a great chance almost nobody would even notice.

 

This is something that our storage vendor, StorPool, set up for us immediately upon seeing what had happened.  They actually apologized that it was not already set up and said that as a result of our disaster they are going to make sure that it is a default behavior that has to be actively disabled rather than the other way around.

 

Even with these snapshots and as powerful as they are - we are still going to overhaul our backup servers.  We have identified the issues with the present setup that caused restorations to be so slow and already have fixes for those issues planned for once we are fully online and all of our clients are taken care of.

 

Snapshots are a very powerful tool against data loss and corruption.  We actually used them a couple of times on our old storage platform, the Nimble CS500, to recover data on servers when clients made big mistakes themselves.


  • 1
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#82 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 25 September 2018 - 10:56 PM

R3 is still restoring, and is about done.

 

We are restoring the S5 server now and preparing the S4 server for restoration.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#83 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 25 September 2018 - 11:42 PM

R3 server is completed.  S5 is still restoring.  S4 should start restoring in about 3 hours or so.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#84 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 12:32 AM

S5 is completed, S4 starting soon.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#85 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 01:10 AM

My personal estimation is that we'll be fully done restoring data by 8 AM.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#86 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 01:53 AM

Once all servers are fully restored we will be performing a quick reboot of them all.  This reboot should take ~30 seconds each - and we're going to be doing this to get the systems into a clear/fresh state after all of the massive restorations and data transfers.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#87 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 26 September 2018 - 03:58 AM

We are doing the 5 remaining s0 accounts straight off the old backup server while the s4 backup data is being prepped onto the ssd server.  S4 backups will begin in about 15 minutes.


  • 0

#88 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 26 September 2018 - 04:27 AM

I have started the restores of the s4 server.


  • 0

#89 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 26 September 2018 - 06:38 AM

The accounts restore's on S5 are over half way done. A bit slower then we hoped.  But we will get done before too long.


  • 0

#90 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 26 September 2018 - 07:27 AM

The restore speed has slowed noticeably.  however it will complete as soon as possible.  

The issue is the restores are not competing with the active visitors instead of a empty server.  


  • 0

#91 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 26 September 2018 - 08:00 AM

The last few accounts on s0 have been restored. We have 263 accounts left on S4 to restore.


  • 0

#92 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 219 posts
  • Gender:Male

Posted 26 September 2018 - 10:24 AM

We are down to 122 accounts on s4


  • 0

#93 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 12:20 PM

Due to it being the middle of the day and the servers being busy - the S4 server is bogging down due to the restorations we are conducting.  Once they are done performance should go back to normal.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#94 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 01:10 PM

Identified a setting that needed changed - Deferred Webserver Restarts.  It was restarting on S4 every ~3 seconds - now 5 minutes in between and things are stable and MUCH faster.  Replicated this network wide [with verification of another admin first].

 

S4 is still going to be unhappy until the restores are done but we're down to the last 4 accounts.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#95 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 01:33 PM

Restores are 100% Completed

 

If your site is offline showing a cPanel error page:

 

  • Try connecting to your cPanel by adding "/cpanel" on to the end of your domain.  If you can sign in, this verifies your account was restored.
  • Check to see if you're using our nameservers - if you aren't, you'll need to get your IP from cPanel and update your third party DNS.
  • Make sure you're not just reloading the error page - hitting reload while viewing the error just reloads the error page.

If you are not using third party DNS and your site doesn't appear but you can get into cPanel - try clearing your browser cache and restarting your browser.  If that doesn't work try another browser.  If it loads for you on one browser but not another - that's a caching issue and not a server or network issue.

 

If you are having any issues with your mail client - what we have seen work the most is removing the email account from the client and adding it back.  We haven't yet identified what the difference is.  You can also add "/webmail" to the end of your domain to access your email if your mail client isn't working.

 

We do expect there to be a lot of little issues that we have to resolve so if you have issues and can't sort them please reach out in a ticket.

 

We are doing our best to keep up with support tickets.  I am sorry if it takes us longer to reply than normal but we are answering tickets in the order received and doing our best to fully resolve any issues and to offer good proper non-copy-and-pasted advice.


  • 2
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#96 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 05:20 PM

We are aware that SpamAssassin is either not properly scoring spam - or not scoring it at all and have a ticket opened with cPanel on this.  As soon as this is sorted we'll fix it network wide.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#97 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 09:26 PM

The SpamAssassin issue has been resolved.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#98 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,889 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 26 September 2018 - 09:28 PM

All servers are online and all accounts are restored!

We reached out to our storage platform vendor after the incident and we have worked with them to take steps to prevent an issue like this from happening again. Changes have also been implemented that will allow us to recover from a catastrophic event such as the one we just experienced as quickly as within a few minutes with little to no impact on the services themselves.

We are going to be conducting a thorough review of the events leading up to this incident and making changes to our policies and procedures based upon our findings.  How the incident was handled is also going to be reviewed and we are going to develop a new comprehensive backup and emergency response plan.

If you are still experiencing any issues at all or need help with anything please do not hesitate to reach out to us.  We are here to help and will do our best to assist you in recovering from this incident in any way that we can.

Thank you,


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users