Jump to content
MDDHosting Forums

[Completed] Fresh Backups of Echo / Cypress / Fresco


Recommended Posts

Us keeping backups of the servers and your data, in our opinion, is extremely important for the protection of your data against potential hardware failure and other forms of data loss and corruption. We experienced some issues with our backup server yesterday afternoon during scheduled maintenance on the backup system that is going to force us to take fresh backups of all servers to protect your data.

 

This fresh backup will need to copy every single bit of data off of each server to the backup server which is a very disk intensive process and will certainly cause some performance hits tonight while the processes run. We weighed the options between putting this off for a week or two due to the intermittent issues we've had over the last couple of weeks however it's a risk that we simply do not wish to take.

 

The process on each server will begin at these times:

Cypress Server - 10:30 PM EST (GMT-5) (Estimated 8 hour run-time)

Echo Server - 11:30 PM EST (GMT-5) (Estimated 6 hour run-time)

Fresco Server - 12:30 AM EST (GMT-5) (Estimated 6 hour run time)

 

If we do not run this process tonight the only other viable option would be waiting 7 days until next Sunday to run the process which would leave your data unprotected and not backed up for a period of a solid week where any hardware failure or other major system issue could cause your data to be lost without resource.

 

We understand any frustration this may cause and we apologize. We hope that the backup process impact will be minimal due to the server usage levels being their lowest on Sunday evenings and we expect the process to be finished by the start of normal business hours in the US on Monday morning.

 

We do realize that not all of our customers are based out of the United States and that this backup process may cause issues for some of our clients whose peak times are opposite those in the US however ultimately we need to perform these backups when the overall server usage is at it's lowest and that time is during the off-peak time for the US.

 

If you have any questions, let us know.

Link to comment
Share on other sites

Here is an exact copy of the email message that has been dispatched to all customers.

Over the past couple of weeks we've had some performance issues with our servers mostly during the over-night period due to a couple of factors. I want to detail why these issues have happened, what we're doing to resolve them, as well as going over some scheduled maintenance for this evening. We always do our best to be as transparent and honest as possible and as such we're not going to keep any details from you.

 

As you may already know, we run CloudLinux on our servers to help keep things stable and to prevent the over-usage of some accounts affecting everybody else. The system is generally very solid and does a wonderful job of doing what it's supposed to do. Recently we, at the advice of CloudLinux, upgraded to a newer version of the system that was supposed to offer increased performance and reliability when in fact it turns out that we ran into a couple of serious issues with the new version of the software that were exasperated by our R1Soft backup system which can be very intensive all on it's own.

 

During normal use the new version of CloudLinux we were running on the servers performed well and did indeed give some performance gains especially when it came to hard drive access speeds on the server. The issues arose when our R1Soft backup system was run to back up your data and protect you from hardware failure and other data loss causes. This backup process tends to be very intensive as it needs to scan every bit on the disk to look for and record any changes to the data made to keep an up-to-date backup of your data. When this backup process ran, without any warning the servers locked up forcing us to not only reboot the systems but to also abort the backup process that had already been running for an hour or more. We have downgraded our CloudLinux installations across our servers back to a version we operated with for a very long time without issues and we don't anticipate having any major issues with the backup systems from this point forward.

 

We have always kept backups of our servers and we tend to use those backups fairly regularly to fix issues for customers such as when somebody accidentally deletes a file or drops the wrong database or database table. This process due to being very intensive does cause a few minutes of extremely slow performance every night when it runs however this usually clears up in 2 to 3 minutes. We realize that this can be annoying at best however it's the cost of keeping up to date off-server copies of all data and databases.

 

Due to all of the issues we've faced while performing backups over the last week we feel that we cannot rely upon the quality of the current backups stored of the systems and as such we need to perform fresh backups of all servers to ensure that if the need does arise to use the backups to restore data, that the data restored is accurate and problem free. We evaluated our options as far as when to run these fresh backups to cause the least impact in service performance for our customers and we determined that Sunday nights are the best times to do such intensive processes. The next decision was whether we should wait a week or two to perform this fresh backup or to go ahead and do the backup as soon as possible.

 

We have chosen to perform the seed backups tonight as we feel a week without reliable backups is far too long as hardware failure is not something that can be expected or planned for. We do run redundant disk arrays in our servers to help protect against drive failure (up to 2 drives can fail per server without data loss) however running redundant arrays is not a substitute for reliable daily backups of the data. This process we expect to take between 6 and 8 hours per server and once we've performed this fresh backup the backup system should not cause any major performance degradation in the future beyond the expected two to three minutes per night that it takes for the backup system to spin up per server and get started.

 

You may be frustrated with the issues we've faced on a couple of our servers over the last two weeks and we're right there with you on that frustration. We have posted a thread on the forum with the information contained from this email as well as some additional details about the backup process for this evening. If you you have any direct questions you would like to ask you are welcome to respond to this email or to visit the forums and publicly post your thoughts / questions / comments / suggestions as well.

 

Here is a link to the forum thread, for your convenience:

http://forums.mddhosting.com/topic/463-scheduled-fresh-backups-of-echo-cypress-fresco/

 

Thank you,

Link to comment
Share on other sites

Great advance notice and detailed explanation! I like the fact that we know what's coming up, and why when possible.

 

For those of us with a VPS on a different server (ie Atlantis), I'm assuming the above problems with cloud linux and the slight possibility that the backups could be corrupted doesn't apply?

Link to comment
Share on other sites

That is correct, our VPS servers are being backed up by R1Soft nightly and have not experienced any issues at all with backup integrity. We hope that this is the last time we have to do a fresh seed backup for quite some time. Tonight should be low enough usage across the servers that the backup process shouldn't cause any issues but ultimately we wanted to give notice just in case it does cause any performance issues.
Link to comment
Share on other sites

Another big thanks for MDD being proactive in taking action.

 

One minor clarification - both messages and the email mentioned Sunday night, but they also say 'tonight', which of course is Saturday in the US. Just need to let my users know which night.

 

Thanks,

 

Tom

Link to comment
Share on other sites

Thanks for letting us know what is going on. My site has been slow the last couple days and I was wondering what was going on. Will we will be able to edit our sites during this time or advised not to?

 

Also, is it today 4/2 or 4/3? It's only Saturday for me hehe

Link to comment
Share on other sites

MDDhosting, always stepping ahead and never behind when it comes to service and proactive technical actions, as to these recent and current issues of server lag, etc.

Thank you MDDhosting for such a wonderful approach to our company and all of your hosted customers as well. It sounds like there are many of us out there that are happy with your services.

Link to comment
Share on other sites

Cypress did just now go unresponsive for a minute however it was due to the backup system doing the intensive first backup + a user gzipping an extremely large file. We've cancelled the gzip temporarily and the server will take a minute or two to catch back up and get back to normal.

 

We are watching the servers closely to try and make this as smooth as possible.

Link to comment
Share on other sites

Cypress has run into some issues with disk I/O during the full backup due to user-generated backups. We've temporarily been suspending those processes while this full backup runs.

 

Echo has finished and Fresco will be done in about 2 hours. Cypress looks to be taking quite a bit longer due to the large amount of data to be backed up (around 1.2 TB).

Link to comment
Share on other sites

As we've been watching Cypress we see that the log processing on the server as well as the daily system crons were previously interfering with the backup process. Tonight we stopped them manually while this process runs to make the server responsive again and will manually process these later today once the backup process is finished.
Link to comment
Share on other sites

Just a note, because it might be related seeing as this is the most recent action taken on the server I'm on. We've been extraordinarily slow for about an hour now. Cypress server. 8+ seconds to load a page on my sites.

 

P.s.

Ticket sent.

Link to comment
Share on other sites

Just a note, because it might be related seeing as this is the most recent action taken on the server I'm on. We've been extraordinarily slow for about an hour now. Cypress server. 8+ seconds to load a page on my sites.

 

P.s.

Ticket sent.

I can confirm that my site on cypress is experiencing the same issue. CloudFlare cache is kicking in b/c my site is timing out.

Link to comment
Share on other sites

Indeed the server is having some issues right now - it looks like R1Soft decided to verify it's full backup from yesterday which I've never seen it do before. R1Soft has been less than helpful but ultimately we have three choices:

1. Let it finish and have accurate verified backups of the server and your data.

2. Cancel it now, after it's been running so long, have it run tomorrow and cause issues again tomorrow.

3. Cancel the backup and then there is no backup should there be hardware failure or another unexpected form of data loss.

 

We worked to prioritize the web server and MySQL and de-prioritize the backup system but it seems that there is still an underlying disk performance issue at work that is causing problems. Try as hard as we can to get things to stabilize some sites are operating normally while others are simply not operating well or at all.

 

We do estimate once the backup verification finishes that things should speed up and get back to normal however if not we're going to have to continue investigating to see if there is another underlying issue causing performance degradation. The backup verification itself should have finished in less than 10 hours however it has been running for 15 hours and reports 17 minutes remaining.

 

We're actually going to be making big changes to the semi-dedicated platform as we're not happy with the current server's ability to keep up with intensive disk situations such as long-term backups.

 

I'm also working on bringing online a VPS temporarily where I can stage any accounts that are under 10 GB of disk usage that wish to be moved off of Cypress until the changes to the semi-dedicated offerings are completed.

 

I'm going to be posting a very detailed message as to why Cypress has been facing issues and what we plan on doing to fix the issue permanently. All customers on Cypress will be getting an email with the information as well as a link to the thread I'm going to post once it's ready.

Link to comment
Share on other sites

Indeed the server is having some issues right now - it looks like R1Soft decided to verify it's full backup from yesterday which I've never seen it do before. R1Soft has been less than helpful but ultimately we have three choices:

1. Let it finish and have accurate verified backups of the server and your data.

2. Cancel it now, after it's been running so long, have it run tomorrow and cause issues again tomorrow.

3. Cancel the backup and then there is no backup should there be hardware failure or another unexpected form of data loss.

 

We worked to prioritize the web server and MySQL and de-prioritize the backup system but it seems that there is still an underlying disk performance issue at work that is causing problems. Try as hard as we can to get things to stabilize some sites are operating normally while others are simply not operating well or at all.

 

We do estimate once the backup verification finishes that things should speed up and get back to normal however if not we're going to have to continue investigating to see if there is another underlying issue causing performance degradation. The backup verification itself should have finished in less than 10 hours however it has been running for 15 hours and reports 17 minutes remaining.

 

We're actually going to be making big changes to the semi-dedicated platform as we're not happy with the current server's ability to keep up with intensive disk situations such as long-term backups.

 

I'm also working on bringing online a VPS temporarily where I can stage any accounts that are under 10 GB of disk usage that wish to be moved off of Cypress until the changes to the semi-dedicated offerings are completed.

 

I'm going to be posting a very detailed message as to why Cypress has been facing issues and what we plan on doing to fix the issue permanently. All customers on Cypress will be getting an email with the information as well as a link to the thread I'm going to post once it's ready.

I would love a temporary VPS until we can find a permanent fix -- assuming it's fully managed. My Linux skills are C+.

Link to comment
Share on other sites

 Share

×
×
  • Create New...