[Resolved] Echo Server Repair

weetu · September 18, 2010

How much longer? Another day? Is it 10% done? 50%?

Betacentauro · September 18, 2010

Was going on???? How much we have to wait we need a solution now is 24 hour off line or more, as many others we have clients under our services. How can I get my back ups to activate a provisional hosting and forward the domain with out chaging all my clients dns

Thanks

patlaw · September 18, 2010

What's going on?

Read this thread starting at the beginning. Everything is explained in detail.

How much we have to wait we need a solution now is 24 hour off line or more, as many others we have clients under our services?

Everything is explained above.

How can I get my back ups to activate a provisional hosting and forward the domain with out chaging all my clients dns?

As mentioned in the thread above, if you have your backups, MDD will install them on another server and configure the DNS. If you don't have backups, you will have to wait for the restore at MDD to finish.

Mike_M · September 18, 2010

How much longer? Another day? Is it 10% done? 50%?

+1

jonnyboy · September 18, 2010

I still have no website and all of my emails have now disappeared from my mail client..please tell me that these will also be restored once the echo server comes back online?!

patlaw · September 18, 2010

As of 6:45PM yesterday...

This process is going to take the full 24 to 48 hours and we'll update you if there are any changes.

Therefore, the prediction as of yesterday is that we will be back in service tonight or tomorrow night.

Michael J. Russell · September 18, 2010

Michael and co., I'm understanding you to say that a load-balanced multi-server or full VPS configuration would not have prevented the present outage, correct?

Can you clarify why only ECHO was affected? Is it simply because the attacker happened to have only targeted that machine, or are there differences (that you can at least reference here w/o putting those machines + clients at risk...)in the way in which those machines are configured?

Please advise. TIA.

Brad · September 18, 2010

If you want a load-balanced fail-over setup that wouldn't be affected by this type of issue you'd be paying $75+/month just for a shared account. The issue there is that it really depends on how the data is replicated across the balancer as to whether that would even help in an exploitation situation. It'd save your day in the event of hardware failure but in a system-level root kernel exploit it would be helpless to prevent the damage.

This was a zero-day kernel exploit - I'll be honest that we *probably* could have just restored the defaced/deleted data and gone from there and had much less downtime but my question is this: What happens if we take this shortcut and then the server is re-exploited through a hidden back-door and your data is not only lost but stolen?

It's not a risk we're willing to take - we're restoring the server back to a point in time before the attack happened to be sure that the server is secure and we're going to mitigate the exploit before bringing the public network online.

If you have your own backup of your account, open a ticket and we'll restore it to another server and get you back online very quickly. If you don't, you're going to have to wait for the server to be restored from the backup.

Our backup system is supposed to be able to do a full server restore in 5 to 10 hours however due to unforeseen circumstances it's taking substantially longer. We've done everything we can to speed up the process however there is only so much that can be done at this point.

If you do have any further questions you're welcome to post them.

I've forwarded my domains while repairs are underway but just had a thought along the lines of the above. Could you have implemented the quick fix you mention above on a extra drive and then simply swapped drives once the backup transfer was complete? It would mean the risk of a backdoor but only so long as it would take to finish the backup. This would mean a hugely reduced downtime. It would also likely mean much more labour time for MDD I realize but I'm starting to notice how much money we're losing with regards to our ad campaigns and game sales as well as the fact that I'm starting to see our visitors posting on competing sites in search of a replacement for the resources we were providing.

Should I be feeling entitled to at least an MDD credit or is this simply the way of the net? I hate to put the pressure on but business is business after all.

Betacentauro · September 18, 2010

i can't believe The company don't have any contingency plan for this king of situations, what happen with "A 99.9% uptime guarantee". i can understand few hours off down time but 24 to 48 is a killer, i already lose some of my clients, the other ones are really upset, i thanks that is weekend so the situation is not worse, waiting 24 to 48 is not a solution for all this, what is going to happens with all the data is been sent to the down server like mails, don't tell is loose i going to have a lot more problems.... i know is hard time for all of us, and i try to understand but how we going to be sure this is't happening again, the security is a main issue here, the company sell reliability and security and with this point we don't have any of dose. So i have to have 2 servers in order to provide my self with 99.99 uptime.

Ivone · September 18, 2010

Thanks for making this such a transparent process Mike.

You and your team must be exhausted and I am sure you are doing best you can handling the situation.

Thanks for keeping me/us in the loop..

BTW: I am sure some clients here get pissed off but their is no point to putt extra pressure on MDD. This can happen to any hosting.

Mike_M · September 18, 2010

We are all feeling the pain here in one form or another. This is just nerve wrecking.

Please provide us with a current update.

The natives are getting restless!!!

Michael D. · September 18, 2010

How much longer? Another day? Is it 10% done? 50%?

The timeframe hasn't change as of yet - we're still looking at approximately 2PM tomorrow (Sunday). As I said previously however this ETA is not set in stone as the backup system has been speeding up and slowing down due to fragmentation of the backup files on the backup server. Unfortunately the backup system is on an EXT3 File System right now which cannot be defragmented - we're moving to XFS which allows online defragmentation once this restoration is completed.

Was going on???? How much we have to wait we need a solution now is 24 hour off line or more, as many others we have clients under our services. How can I get my back ups to activate a provisional hosting and forward the domain with out chaging all my clients dns

You have clients under your services with us just as we have clients under our services with us - we're all in the same boat here. We're going to have it online as quickly as we can but unfortunately there's nothing more we can do to speed this up.

I still have no website and all of my emails have now disappeared from my mail client..please tell me that these will also be restored once the echo server comes back online?!

Your emails that were on the server up to the restoration point we're restoring to will come back, yes.

Michael and co., I'm understanding you to say that a load-balanced multi-server or full VPS configuration would not have prevented the present outage, correct?

Not necessarily - it really depends on how it was configured. Chances are in a load-balanced situation where an attacker gained root access to one server - the other one would quickly mirror those changes over and we'd be in the same position.

Can you clarify why only ECHO was affected? Is it simply because the attacker happened to have only targeted that machine, or are there differences (that you can at least reference here w/o putting those machines + clients at risk...)in the way in which those machines are configured?

The script that was used was uploaded to an account where a user failed to keep their script updated and secure. The user used a script exploit to place the file on the server and then used that same exploit to execute the script which did the damage.

The exploit that was used that did the damage itself was a zero-day exploit that was reported hours before it hit us and there was no known mitigation and no patch when we were hit by this. It's not something we could have prevented and it could have happened to any provider.

The thing you should all keep in mind is that while we do understand your frustration - things could be much worse. I don't know HOW many times I've seen, "Sorry, we lost your data due to XYZ and we don't have a backup of it. If you have your own backup file we'll be happy to restore your account." While the restoration process may be going much slower than we'd like - at least we have a backup copy of your data.

I've forwarded my domains while repairs are underway but just had a thought along the lines of the above. Could you have implemented the quick fix you mention above on a extra drive and then simply swapped drives once the backup transfer was complete? It would mean the risk of a backdoor but only so long as it would take to finish the backup. This would mean a hugely reduced downtime. It would also likely mean much more labour time for MDD I realize but I'm starting to notice how much money we're losing with regards to our ad campaigns and game sales as well as the fact that I'm starting to see our visitors posting on competing sites in search of a replacement for the resources we were providing.

It's not about labor - I can assure you we've been working more and harder over the last day than we have over the last year just doing our best to get things back online. Not only did we have to fully investigate the original issue and find a way to mitigate it but we also had to work on restoring the data.

We do have a backup of your data - it's just taking a long time to restore. Let me be a little clearer:

We've not lost your data, it's still safe.
It could be worse, your data could have been lost and many providers would have said, "Sorry, it's gone."
If you have your own off-provider backups - we'll be more than happy to restore them to another server.
No matter who you host with (be it us, or anybody else) you should always keep your own off provider backups.
R1Soft backups do not work in a way that we can simply "swap a drive" and have everybody back online.
We're absolutely not going to risk bringing a compromized server online, while the attacker didn't steal any data the first time around we're not going to risk having any of your data stolen this time around.
It's not smart to store personal information such as credit card numbers in a shared environment, but that doesn't stop people from doing it. Just one of the many reasons we're going with a full restore from before the exploit instead of allowing the possibility of re-exploitation.

Should I be feeling entitled to at least an MDD credit or is this simply the way of the net? I hate to put the pressure on but business is business after all.

Sure, you can have a one-month credit if that's what you desire - officially that's what our Terms of Service allow.

i can't believe The company don't have any contingency plan for this king of situations

We do have a contingency plan - we're restoring your data from a backup server. It could be worse - we could be like most other companies and not have a contingency plan and simply tell you, "Sorry, you're data is gone forever - do you have a backup?" We're not doing that, we're restoring your data - please be patient (I realize it's hard and frustrating, you have to imagine how we feel). The original situation was not something that could have been prevented as there are still no OS patches for this issue and the mitigation code (which does actually disable some OS functionality) was not released until after the Echo server was hit.

what happen with "A 99.9% uptime guarantee".

It's a monetary guarantee - you're welcome to request the one month credit that our terms of service state you would qualify for.

i can understand few hours off down time but 24 to 48 is a killer, i already lose some of my clients, the other ones are really upset

Read through this thread - I'm sure we're going to lose some clients to and I'm sure more than a few are really upset (including you). We're in the same position however we likely have more clients to lose and be upset due to this outage. Now don't get me wrong - I'm not trying to put you or your clients down... I'm just saying that we're in a bad situation as well.

i thanks that is weekend so the situation is not worse, waiting 24 to 48 is not a solution for all this, what is going to happens with all the data is been sent to the down server like mails, don't tell is loose i going to have a lot more problems.... i know is hard time for all of us, and i try to understand but how we going to be sure this is't happening again, the security is a main issue here, the company sell reliability and security and with this point we don't have any of dose. So i have to have 2 servers in order to provide my self with 99.99 uptime.

Or you could simply maintain your own backups of the accounts - we could have had you restored and back online in under an hour. It's a small price to pay if you can't stand to have any downtime at all. Again keep in mind that it could be worse.

Thanks for making this such a transparent process Mike.
You and your team must be exhausted and I am sure you are doing best you can handling the situation.

Thanks for keeping me/us in the loop..

BTW: I am sure some clients here get pissed off but their is no point to putt extra pressure on MDD. This can happen to any hosting.

Indeed it could have - I'm sure there are other providers that were hit by the same exploit - as I said it was an unpatched exploit that nobody know about until just before we were hit with it. You can think of it as standing in an open field on a clear and sunny day and then suddenly being struck my lightning. Nobody saw it coming and there was nothing that could have been done to prevent it. We're restoring the server back to before that happened but it's taking longer than we'd like.

We are all feeling the pain here in one form or another. This is just nerve wrecking.

Please provide us with a current update.

The natives are getting restless!!!

There's nothing to update - the process is still running and it's still on track for the ETA provided earlier.

Mike_M · September 18, 2010

Thanks for your quick responses Michael.

Could you tell us what the back up restore point will be? 24 hours prior, 48 hours...

Thanks again.

Chy · September 18, 2010

This may be an obvious answer but like most tired, frazzled and worried as well and need confirmation the restoration will take care of it. Members and clients reported viewing the hackers message prior to going offline so I know my files were infiltrated, will this be taken care of with the restoration? Or do I need to be concerned about going through and scrubbing files again on my domains? With the amount I have I dread the thought.

Michael D. · September 18, 2010

Thanks for your quick responses Michael.

Could you tell us what the back up restore point will be? 24 hours prior, 48 hours...

Thanks again.

09/13/2010 at 3:33:55 AM EST (GMT-5) is the point that is being used.

Mike_M · September 18, 2010

Why so far back? I'm going to be sick now.

Can we restore to another point once the server is back online?

Please tell me this can be done.

Michael D. · September 18, 2010

Why so far back? I'm going to be sick now.

Can we restore to another point once the server is back online?

Please tell me this can be done.

Some files were modified in the backup on the 14th from what we could determine - we're able to restore individual files/accounts/etc back to a newer date however we weren't going to restore the entire server to a newer date as not to risk a potential back-door.

If you have your own backup of your account we'll be happy to restore it to another server and get you online now.

Edited September 18, 2010 by MikeDVB
Corrected date, typo'd it.

supernix · September 18, 2010

This sounds like our accounts were hosted on a Windows based computer. The whole feel secure nature of Linux is that it is a real multi-user environment. And with each users files set with the proper permissions it is completely impossible for one user to alter another's files.

But you just told us that some user with an account on the server didn't keep their scripts up to date and so now we all are paying a heavy price in a extremely long down time in computing terms.

For any exploit to work like you posted then that file would have to have been running with a privilege other than the user to which the file was executed.

This as best I can tell would mean that clients are not jail shelled and that the files are running under a common user account as say Apache.

Just trying to understand what happened and why.

Michael D. · September 18, 2010

This may be an obvious answer but like most tired, frazzled and worried as well and need confirmation the restoration will take care of it. Members and clients reported viewing the hackers message prior to going offline so I know my files were infiltrated, will this be taken care of with the restoration? Or do I need to be concerned about going through and scrubbing files again on my domains? With the amount I have I dread the thought.

We're restoring the server back to a time before any files were modified and the system was compromised.

This sounds like our accounts were hosted on a Windows based computer. The whole feel secure nature of Linux is that it is a real multi-user environment. And with each users files set with the proper permissions it is completely impossible for one user to alter another's files.
But you just told us that some user with an account on the server didn't keep their scripts up to date and so now we all are paying a heavy price in a extremely long down time in computing terms.

To be honest almost nobody keeps their scripts up to date and we see *accounts* compromised every day due to it and that's nothing new. What is new in this situation is a bug in the 2.6 Linux Kernel (the latest) that was exploited before there was a chance to develop a patch. I'm sorry if you've seen Linux as fool-proof and secure on all levels no matter what, as software is designed and written by humans and is therefore going to be flawed.

For any exploit to work like you posted then that file would have to have been running with a privilege other than the user to which the file was executed.

Which is exactly what the exploit did - it found a buffer underrun in the 64bit 32bit conversion that allows 32 bit executables to run on a 64 bit server.

I've already linked to the details previously in this thread but in case you missed them, here they are again:

https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2010-3081

https://access.redhat.com/kb/docs/DOC-40265

This as best I can tell would mean that clients are not jail shelled and that the files are running under a common user account as say Apache.
Just trying to understand what happened and why.

Very few clients have jailshell however they wouldn't have needed jailshell for this exploit to have worked as it did. Files run under the user name and not under "apache" or "nobody".

It's like I said - this was an exploit at the OS level (as per the bug report and the security report from RedHat - the developer of the system kernel) and ultimately there is *nothing* we could have done to prevent it.

A simple example is that if you buy a car brand new that's supposed to run for 100,000 miles and an axle breaks after 50 miles. It's a manufacturing fault in the AXLE and not *your* fault or the dealership's fault that it happened. The same idea applies here - this was a bug in the operating system core itself - there's nothing that us or our hardware provider could have done to prevent it.

Hopefully this clears some of your questions up - if not let me know.

Edit: If you are just now coming into this thread, please read it from the beginning and see if your questions have already been answered.

iansltx · September 18, 2010

@supernix I can attest to the fact that MDDHosting is very definitely a Linux system...really hope you were kidding there.

Also, it's an exploit for crying out loud! When you get privilege escalation due to a flaw in the Linux kernel there isn't much you can't do. Again, the magnitude of this attack was the result of a kernel exploit, not a run-of-the-mill scripting vulnerability.

...and yes, my personal site and the site that I admin is down like everyone else's. Going to the office in a few minutes to see if I can grab a CPanel backup I made of the admin'd site so that can get online sooner rather than later.

Mike_M · September 18, 2010

When exactly was they system compromised?

supernix · September 18, 2010

I was dumb enough when I started hosting in 2000 to use the IIS server and Windows 2000. Bad mistake as it was slapped around like a cheap ****** and owned by hackers when they were having the hacker wars back then. The only way to keep them out was to pull the plug.

And that is why I never host anything with Windows ever again.

Michael D. · September 18, 2010

I was dumb enough when I started hosting in 2000 to use the IIS server and Windows 2000. Bad mistake as it was slapped around like a cheap ****** and owned by hackers when they were having the hacker wars back then. The only way to keep them out was to pull the plug.
And that is why I never host anything with Windows ever again.

Linux can be the same way - it's all about whether you know what you're doing and how to properly secure your system. Even a secured system can have a system-level exploit (that you can't secure against) that can cause a major issue. It's fairly rare but they do happen.

If you just bring a server online with Linux, Apache, MySQL, and PHP without doing any work to secure it - it's going to end up compromised in short order, just like an unsecured Windows server.

Dan S · September 18, 2010

This can happen to any hosting.

Quick recovery from these issues is also possible for any hosting. Set expectations low and the results will follow.

MjrNuT · September 18, 2010

I think you should roll some of your responses here into the OP for people when you have the time.

I have some follow up questions to some of your replies.

As stated previously, we don't need to see the details of the script used to perform this exploit.

Has the account owner been notified of their out of date "script" for which the exploit was injected? Can you divulge what the script and version is?

What has been done or going to be done with that account?

Was the account owner even aware of the injection or did they learn after access was removed to Echo altogether (i.e., all sites inaccessible)?

Are the other MDD servers configured the same as ECHO?

I am glad that MDD can commit to saying no data was lost.

I am glad that MDD has committed to a restore date of confidence (based on file changes), followed by potential of newer, specific restores on case-by-case basis.

I am glad that the communication is flowing given this technically, time-dependent resolution path.

Edited September 18, 2010 by MikeDVB
Removed extremely long quote to shorten the post and reduce duplicate content in the thread.

Sign In

[Resolved] Echo Server Repair

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

patlaw

Brad

Michael D.

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation