[Resolved] Echo Server Repair

Michael D. · September 18, 2010

Quick recovery from these issues is also possible for any hosting. Set expectations low and the results will follow.

Ever had something not go exactly how you planned? It's not that anybody is setting expectations low - it's that this is the real world. Our backup system is supposed to be able to bring a server back online from a restoration point in under 10 hours however it's performing slower than it should.

Our expectations of the backup system were very high and unfortunately in the backup system's current configuration it is excellent for small restorations such as individual accounts or databases but it's really suffering when it comes to restoring a whole server.

Due to us having high expectations - we've conferenced with our hardware vendors and R1Soft and come up with an action plan to put in place to ensure that if we ever have to restore a server again from a bare metal restoration point - that it *will* happen quickly.

As stated previously, we don't need to see the details of the script used to perform this exploit.

If you haven't been notified - then it wasn't you. Due to our respect for client privacy I'm not going to go into detail as to who it was.

Has the account owner been notified of their out of date "script" for which the exploit was injected? Can you divulge what the script and version is?

It's a WordPress script - it could be the script itself or one of the plugins (enabled or disabled) which was exploited. WordPress itself is updated regularly and I suggest keeping it up to date and only running well known and trusted plugins. I also recommend removing any unused plugins. While they're not called directly by WordPress - an exploit scanner can find them and execute them.

What has been done or going to be done with that account?

I don't see how this is a valid question - if there is an issue with your account I don't think you'd want me telling our entire client base what we're doing with your account.

Was the account owner even aware of the injection or did they learn after access was removed to Echo altogether (i.e., all sites inaccessible)?

Again, I don't see how this is a pertinent question. I understand your curiosity but I'm not going to discuss direct communications with a single client with the entire client base - at least - not without their permission.

Are the other MDD servers configured the same as ECHO?

Yes - the thing to keep in mind is that this exploit didn't rely upon poor configuration or security. It was a system core exploit - something that has to be patched against by the operating system developer (RedHat). Patches like these come out fairly regularly and we apply them as soon as they come out. We actually pay money for a system called KSplice to be able to install these system-level updates without having to reboot. If you are with another provider that isn't using a system similar to KSplice and isn't rebooting their servers monthly - chances are there are numerous system exploits just waiting to be taken advantage of.

Nadir Novruzov · September 18, 2010

when sites will be online?

Michael D. · September 18, 2010

when sites will be online?

The answer:

The timeframe hasn't change as of yet - we're still looking at approximately 2PM tomorrow (Sunday). As I said previously however this ETA is not set in stone as the backup system has been speeding up and slowing down due to fragmentation of the backup files on the backup server. Unfortunately the backup system is on an EXT3 File System right now which cannot be defragmented - we're moving to XFS which allows online defragmentation once this restoration is completed.

MjrNuT · September 18, 2010

If you haven't been notified - then it wasn't you. Due to our respect for client privacy I'm not going to go into detail as to who it was.

It's a WordPress script - it could be the script itself or one of the plugins (enabled or disabled) which was exploited. WordPress itself is updated regularly and I suggest keeping it up to date and only running well known and trusted plugins. I also recommend removing any unused plugins. While they're not called directly by WordPress - an exploit scanner can find them and execute them.

I don't see how this is a valid question - if there is an issue with your account I don't think you'd want me telling our entire client base what we're doing with your account.

Again, I don't see how this is a pertinent question. I understand your curiosity but I'm not going to discuss direct communications with a single client with the entire client base - at least - not without their permission.

Yes - the thing to keep in mind is that this exploit didn't rely upon poor configuration or security. It was a system core exploit - something that has to be patched against by the operating system developer (RedHat). Patches like these come out fairly regularly and we apply them as soon as they come out. We actually pay money for a system called KSplice to be able to install these system-level updates without having to reboot. If you are with another provider that isn't using a system similar to KSplice and isn't rebooting their servers monthly - chances are there are numerous system exploits just waiting to be taken advantage of.

I was not intending to see details of the particular client be divulged. The questions were stated from the standpoint of reintroducing the problem again. I understand that a mitigation method will be put in place for the OS. Having said that in review of your reply, the compromised Wordpress script will resume, is that correct?

I was interested in where the source was as it is a part of the root-cause of the exploit. So I would think to disable that account such that the owner updates it, etc. before moving forward. This is the pertinence of the questions. I run Wordpress and I'd like to know if I was the idiot responsible and what needed to be done to fix it on behalf of others. I know I'm not the person since I was not notified.

I hope that clears up my questions and not intending for privacy to be ignored due to the event.

Michael D. · September 18, 2010

I was not intending to see details of the particular client be divulged. The questions were stated from the standpoint of reintroducing the problem again. I understand that a mitigation method will be put in place for the OS. Having said that in review of your reply, the compromised Wordpress script will resume, is that correct?

The situation has been handled - it doesn't make sense for us to roll back the server just for us to allow it to become compromised again, do give us at least a little credit.

I was interested in where the source was as it is a part of the root-cause of the exploit. So I would think to disable that account such that the owner updates it, etc. before moving forward. This is the pertinence of the questions. I run Wordpress and I'd like to know if I was the idiot responsible and what needed to be done to fix it on behalf of others. I know I'm not the person since I was not notified.

The issue that allowed the code to be uploaded and run is separate from the code that actually caused the damage. Like I said - peoples sites get exploited every day on every provider due to not keeping scripts updated or not keeping a secure password and rotating it regularly. If the system core didn't have the bug that allowed the damage to happen - all that would have happened is the file would have been uploaded and it simply wouldn't have done anything (i.e. no damage).

I hope that clears up my questions and not intending for privacy to be ignored due to the event.

I understand what you were asking now and I've done my best to answer your question in detail without getting too technical.

Chy · September 18, 2010

We're restoring the server back to a time before any files were modified and the system was compromised.

Thanks for the confirmation.

Betacentauro · September 18, 2010

Thanks for the answer Michael, i never has any to complain about the services and your help wen we are in some king of problems, but wen all this think is over we need to be sure this kind of brake downs don't happen again and i'm talking by my self i need some tips and a will need some help or explanation how to make a back up and restore the data in other secondary server. my business cant take one more of this. and many other are in the same situation, i'm not an expert in hosting and DNS stuff, is not my field in that one i trust MDD with eyes close. I hope we can get out of this problem as soon as possible and look forward for ways to prevent this king brake downs, thanks for all your HARD work and understanding.

Michael D. · September 18, 2010

Thanks for the answer Michael, i never has any to complain about the services and your help wen we are in some king of problems, but wen all this think is over we need to be sure this kind of brake downs don't happen again and i'm talking by my self i need some tips and a will need some help or explanation how to make a back up and restore the data in other secondary server. my business cant take one more of this. and many other are in the same situation, i'm not an expert in hosting and DNS stuff, is not my field in that one i trust MDD with eyes close. I hope we can get out of this problem as soon as possible and look forward for ways to prevent this king brake downs, thanks for all your HARD work and understanding.

We'll be more than happy to help you with setting up your own backup plans and own contingency plans once this is all resolved. Just open a ticket once everything is back online.

Michael D. · September 18, 2010

Ksplice has just sent us an email letting us know that they've pushed out a patch for the issue that caused the downtime. I'm quoting the email below:

Subject: [Ksplice][RHEL 5 Updates] New updates available via Ksplice (CVE-2010-3081)
Message:
Synopsis: CVE-2010-3081 can now be patched using Ksplice
CVEs: CVE-2010-3081

Systems running Red Hat Enterprise Linux 5 and CentOS 5 can now use
Ksplice to patch against CVE-2010-3081.

Ksplice is now providing an update for the high profile security
vulnerability CVE-2010-3010. Ksplice does not normally publish rebootless
updates for RHEL or CentOS before Red Hat has finished releasing a new
kernel, but in this case due to the high profile of this security
vulnerability, the fact that other distributions have successfully
provided this update, and our communications with the Red Hat security
team, we are now making this update available for customers to install.

Please note that the mitigation steps described at
, while effective against one
public exploit for CVE-2010-3081, do not actually correct this
vulnerability. A modified version of this exploit is effective even
against machines that have used the published Red Hat mitigation approach.
The only known effective solution to CVE-2010-3081 is to update the
kernel.

INSTALLING THE UPDATES

We recommend that all Ksplice Uptrack RHEL 5 and CentOS 5 users
install these updates. You can install these updates by running:

# uptrack-upgrade -y

DESCRIPTION

* CVE-2010-3081: Privilege escalation through stack underflow in compat.

A flaw was found in the 32-bit compatibility layer for 64-bit systems.
User-space memory was allocated insecurely when translating system
call inputs to 64-bit. A stack pointer underflow could occur when
using the "compat_alloc_user_space" method with an arbitrary length
input, as in getsockopt.

We've patched all servers against this vulnerability.

mipra · September 18, 2010

We've patched all servers against this vulnerability.

there you go....

Now, shall we all get some sleep?

cvos · September 19, 2010

If we would like to upload files to our account how may we do this? I can't access any site from a web browser, and the ping times out.

Scott · September 19, 2010

If we would like to upload files to our account how may we do this? I can't access any site from a web browser, and the ping times out.

Accessing your account is not possible until the server has been brought back online. Once service is restored, you will be able to access your account like normal.

cvos · September 19, 2010

when is it anticipated that the server will be up so we can update information? Also what is happening to email during this time?

Scott · September 19, 2010

when is it anticipated that the server will be up so we can update information? Also what is happening to email during this time?

Nothing has changed from what was stated above. The estimated time that things will be back is still about 2pm tomorrow, or twenty and a half hours from now.

As for email, most mail servers are configured to attempt to resend email for up to 72 hours, so most of your messages should come through.

Michael D. · September 19, 2010

The Echo server is back online and operational ahead of schedule (by just under 12 hours). We do again sincerely apologize for this outage and appreciate your understanding as we did what was necessary to protect your account security and data integrity.

The server is going to be extra-busy for the next 24 to 48 hours as it gets slammed with mail that was sent to accounts on the server while the server was offline and other systems play catch-up. If you have any issues or questions feel free to post them here or to open a support ticket.

Thank you,

kocchi · September 19, 2010

Thanks!

forumite · September 19, 2010

I'd like to Thank Michael and the MDD Hosting team for the way in which they handled this situation. Yes, it was inconvenient, and probably tough for some folks, but Michael and Scott were open with us throughout the entire process. The pressure on these folks must have been tremendous, but they continued to respond to difficult questions in a calm, professional manner.

I have no doubt that Michael and his team will be working with the backup vendor to ensure the process, should it happen again in the future, will be smoother and faster.

Michael D. · September 19, 2010

I'd like to Thank Michael and the MDD Hosting team for the way in which they handled this situation. Yes, it was inconvenient, and probably tough for some folks, but Michael aand Scot were open with us throughout the entire process. The pressure on these folks must have been tremendous, but Michael continued to respond to difficult questions in a calm, professional manner.

I have no doubt that Michael and his team will be working with the backup vendor to ensure the process, should it happen again in the future, will be smoother and faster.

We actually already have plans that are set into motion to ensure that any future restorations (large or small) complete much faster. We obviously hope to never have to use the system like this again but it is nice to know that should something happen that is outside our control (a system-level exploit or an act of god for example) we will be able to restore client data back either to the same hardware or to new hardware.

Being open and honest is company policy here at MDDHosting - many providers would have tried to cover the issue up or to spin it while we simply tell it like it is. I even went so far as to make sure that no posts in this thread were removed or censored in any way. We understand that our clients were (and likely still are) frustrated over this incident and if they wish to share their frustration, that's perfectly acceptable.

I've sent out an email to everyone advising some suggestions on keeping your own backups just in case as that's definitely a good step to take no matter where you are hosted, who your provider is (even if it's us), and what they promise.

If you have any questions, comments, or concerns about the outage, the restoration process, or anything else related to this issue by all means please feel free to let us know.

Edit: Clarification - I did actually "censor" one post but I didn't censor the spirit of the post, I just removed the direct links to the code used to exploit the server for security purposes.

Edited September 19, 2010 by MikeDVB
Clarification

MjrNuT · September 19, 2010

I second forumite's post!

:thumbsup: to MDD.

Look forward to just the ramp up now.

Chy · September 19, 2010

I don't think I've been this happy to wake up to notification alerts I had email.

Thanks boys! You done good!

Mike_M · September 19, 2010

Thanks for all the hard work and keeping us informed along the way!!!

Ivone · September 20, 2010

Thanks Mike and MDD team. You did great job.

frankacter · September 20, 2010

While I'm not personally involved in this incident, it is very refreshing to see the transparent and professional way in which it was handled. Thanks to Mike and MDD team for their continued efforts, please do take time to refresh for both your health and sanity.

You mention you have "plans in motion".. something, as a client, I would appreciate seeing is a message to all clients in say 30 days that includes:

1) What steps have been put in place (actually executed on) as well as any related future steps to address the restore time.

2) A reminder to set up (and execute on schedule) a client level personal backup. A link the documented process / steps would be a nice touch.

3) Partnering with a 3rd party (or building yourself) for automated offsite client level backup service. I'd imagine this would be an optional service at a monthly fee. Something in the spirit of siteautobackup.com or backupalicious.com.

4) Automated server wide checks that can be run to audit and report when client apps/scripts/plugins are out of date. Thinking something like oldscriptfinder.com, but where the messaging is delivered directly to the client once a day/week/month depending on the critical nature of the out of date script. Perhaps offer incentive such as a monthly discount if a client's audits are clear of any outdated scripts :-)

Michael D. · September 20, 2010

1) What steps have been put in place (actually executed on) as well as any related future steps to address the restore time.

Many of the technical details of what is being done most of our clients won't really care about I wouldn't think. I'll more than likely post them here on the forum and offer a link to those that are interested in further details.

2) A reminder to set up (and execute on schedule) a client level personal backup. A link the documented process / steps would be a nice touch.

It all really depends on how you want to go about doing things - whether you want to back up to your home computer, another web host, if you just want to backup your databases daily and your files weekly, etc... Ultimately it's the your responsibility to make sure that you have your own copy of your data. We always recommend that you run your own backups and we'll be more than happy to help anybody unsure how to do this on their own.

3) Partnering with a 3rd party (or building yourself) for automated offsite client level backup service. I'd imagine this would be an optional service at a monthly fee. Something in the spirit of siteautobackup.com or backupalicious.com.

Services such as these exist but the issue is that a vast majority of people choose their hosting based upon price and most aren't going to be willing to pay the additional fees to have their own backups. There are services out there that will automate your backups for you and there are services such as bqinternet.com where you can sign up for an FTP or RSYNC backup account and then you can use a simple script such as the one found here to back up your account to that external storage.

4) Automated server wide checks that can be run to audit and report when client apps/scripts/plugins are out of date. Thinking something like oldscriptfinder.com, but where the messaging is delivered directly to the client once a day/week/month depending on the critical nature of the out of date script. Perhaps offer incentive such as a monthly discount if a client's audits are clear of any outdated scripts :-)

That's an option we've considered in the past but ultimately the script security is the responsibility of the end-user. If every client on our servers is ok with us raising pricing by $1/month per account we could very easily add all of the above. Alternatively our clients could pay nothing additional to us and make sure to set up their own plans for backups and making sure their scripts are up to date.

There are always things that we could offer that we don't that some would see as a good idea. It's a balance at the provider's level to offer quality reliable service with high performance at a good price point.

From this situation - the only thing that really went wrong that we could have possibly had any control over was the speed at which the backup was restored. As I've said plans are being executed currently to ensure that any such restoration in the future will happen much faster.

I've seen many providers go through similar situations be it a system level exploit or hardware failure where data was lost and I can't think of any of them that were actually able to fully recover a copy of the server, in it's entirety. I'm not saying that it hasn't happened - but it's not common.

Now don't get me wrong - we are always evaluating new options and ways to expand our services and what we provide with our plans but not everything that sounds like a good idea would be feasibly implemented without raising prices.

We always do our best to keep our client base apprised of our improvements and changes that we make to the way we operate and features and services that we provide. Once we're done making changes to the way our backup systems operate we'll definitely let everybody know what was changed and what the benefits are.

Shelley · September 20, 2010

With regard to email - do you have any comments or ideas as to how incoming and outgoing were handled during the server downtime and restore. I have personally sent some as a test (from laptop to pc, different addresses) that have not been received. They were held in my outbox until the connection established during or after the restore and then sent. However not received.

I have a couple of clients who are anxious about emails that they should have received during this time - for one, an insurance agent, there is legal ramifications and therefore they need to have a bit more information as to how to address this. I am going to advise them to send out an eblast to their correspondents and ask them to resend all email from Thursday on, however I would like to be able to respond to their queries with some background.

thanks

Sign In

[Resolved] Echo Server Repair

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

patlaw

Brad

Michael D.

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation