Jump to content


MikeDVB

Member Since 27 Sep 2008
Offline Last Active Oct 26 2019 09:39 PM

#7291 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 25 September 2018 - 10:36 AM

We are now moving the SSDs from Slow Backup to Fast Backup to get started on restoring the S3 server.  This is step 2 on the process outlined here: https://forums.mddho...cussion/?p=7288


  • 1


#7279 Major Outage - 09/21/18+ - Client Discussion

Posted by MikeDVB on 25 September 2018 - 09:49 AM

Managed to get my S5 server website back online yesterday by moving to another host a couple days ago.  Had I moved when my gut told me to, as soon as all this happened, I could have been back online already on Sunday.  Lesson learned...  When things this serious go wrong, grab another host for a month right away and take shelter.  Or in my case...  Move.

If you could have moved - then we could have re-created your account and you could have restored in-place.  Whatever you moved to another provider could have been quickly brought back online with us.

 

The server has been online since Friday - the only thing we've been working on is copying our backup data over.  Any clients that have their own backups have been online for days already.


  • 3


#7246 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 10:26 PM

R2 is being copied from Slow Backup to Fast Backup in preparation for restoration.

 

R4 Server finished restoring.

 

R1 is almost done.

 

S2 is restoring now.


  • 2


#7240 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 09:06 PM

About a half hour left of copying s2 from Slow Backup.  R4 and R1 are restoring now.


  • 2


#7237 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 08:04 PM

Latest ETAs:

s1    Completed    
p1    Completed    
r1    In Progress    
p2    Completed    
s2    Tuesday, September 25, 2018 at 6:00:00 AM    EDT
r2    Tuesday, September 25, 2018 at 2:00:00 PM    EDT
s3    Tuesday, September 25, 2018 at 9:00:00 PM    EDT
r3    Wednesday, September 26, 2018 at 5:00:00 AM    EDT
s4    Wednesday, September 26, 2018 at 9:00:00 AM    EDT
r4    In progress    
s5    Wednesday, September 26, 2018 at 3:00:00 PM    EDT
s0    Wednesday, September 26, 2018 at 3:00:00 PM    EDT

 

R4 is our smallest server [storage used] and we had enough time to get it copied over while we were waiting on the SSDs to be transferred.  The only reason we're doing these SSD transfers is because we're restoring servers as they finish copying to the SSDs.

 

I'll detail all of this more once we're online 100% and I have a chance.


  • 1


#7233 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 07:52 PM

We are over 50% restored network-wide. 51.4% actually.  We are still hoping to be 100% by tomorrow - but the ETAs are calculations based upon transfer speeds and restoration rates.  As you've seen they fluctuate and we're still honestly hoping to be one far sooner than any of the ETAs say.


  • 1


#7230 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 07:06 PM

R1 is restoring now, should be done soon.

 

R4 got finished faster than expected and we should be restoring it soon.

 

S2 is also close to being ready to get restored.


  • 4


#7229 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 06:28 PM

We performed some changes on the storage in both backup servers.  The fast backup server is now doing 4,770 GB/hour data rates while it was doing 500gb/hour.  The old backup server has been doing 318 gb/hour.  We're watching to see how much of an improvement the old backup server gets as a result of these changes too.

 

The bottle neck is getting the data from the old backup server to the fast one - but this process is substantially faster than restoring directly from the old backup server which was only doing about 50gb/hour.


  • 5


#7227 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 06:14 PM

We upgraded a storage component on the new Fast Backup server and we are seeing a huge increase in performance over what we were already seeing.  This should cut down the total restoration time - we will have updated ETAs once we have conducted some more transfers.


  • 5


#7221 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 05:51 PM

We are starting the restoration of the R1 server now.


  • 3


#7217 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 04:59 PM

P2 server is done.  ETA to starting R1 Restorations is ~20 minutes.


  • 2


#7206 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 03:30 PM

The copy of data for the S2 server from Slow Backup to Intermediate Transfer DIsks is about 75% completed.  Once it's 100% we'll move the intermediate disks over to FastBackup and get it ready for restoration.

 

1 hour 30 minutes left preparing R1 for restoration from Fast Backup.


  • 2


#7172 Major Outage - 09/21/18 - 09/24/2018

Posted by MikeDVB on 24 September 2018 - 12:53 PM

I am working quickly to separate out client discussions in this thread into the new thread for this purpose -> https://forums.mddho...ent-discussion/


  • 1


#7171 Major Outage - 09/21/18+ - Client Discussion

Posted by MikeDVB on 24 September 2018 - 12:52 PM

I am splitting client discussions out of the main recovery thread so that it is easier to track status updates.


  • 1


#7158 Major Outage - 09/21/18+ - Client Discussion

Posted by MikeDVB on 24 September 2018 - 11:16 AM


Is there a reason why the restoration sequence was changed? S2 was moved to after P2 in contrast to the previous annoucement. It's not a major difference in time, but neverless that's a detail a bit irritating in a sensible situation like this where people are losing money and/or customers.

We are simply doing things as quickly as we can.  We had a short lapse in the ability to move the disks from Slow Backup to Fast Backup and in this lapse we went ahead and copied the smallest server which is S2.   We did this in lieu of just sitting around doing nothing.

 

If P2 was the smallest server it still would have been copied at that point.

 

This is one of the reasons the ETAs are only estimates - because we are doing our best to predict how long it will take to copy and restore each servers' backups and this changes based upon the actual data being copied.  For example a server with 1TB of data usage and 150,000,000 files is going to take a LOT longer to copy than a server with 4TB of data and 25,000,000 files.

 

If you have issues with your site or account after the restoration you will need to open a support ticket.  While I wish we could keep up with individual issues here on these forums it's not feasible.  We have extra staff working on the helpdesk and we're doing our absolute best to keep up considering the ticket load.

 

If your site is restored and you are seeing a cPanel error page there is a good chance your account is not on the same IP as it was and you're using third party DNS.  If you log into your cPanel you can see your new site IP in the status bar or under 'Server Information'.  You'll want to update this at your DNS.  Originally we planned on trying to make sure everybody was put back on their original IPs but the work to do that would double the restoration time or more.


  • 1