Michael D. Posted May 23, 2014 Report Share Posted May 23, 2014 The Echo server has gone intermittent on us and upon logging into the console it's apparent that the file system has become corrupted. We are issuing a reboot and will need to initiate a file system check. The file system check could take up to 2 hours, however, we will be providing updates via this thread as it progresses. Quote Link to comment Share on other sites More sharing options...
gplurn Posted May 23, 2014 Report Share Posted May 23, 2014 thanks for this notice Mike, does that mean my sites and email will be down for 2 hours? Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 The server will be offline until the FSCK is done - there's no way to run a file system check with the server online. I suspect a silently failed SSD for this. Quote Link to comment Share on other sites More sharing options...
sstamour Posted May 23, 2014 Report Share Posted May 23, 2014 Thanks for letting me know. I look forward to updates. Quote Link to comment Share on other sites More sharing options...
gplurn Posted May 23, 2014 Report Share Posted May 23, 2014 Interesting. I work with audio and from what I understand SSDs are more reliable being read than writing to. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 The server is back online and is working on catching up due to the flood of requests. I did disable SSD caching so it will take a little longer than normal but it should be back to normal in ~15 minutes. Quote Link to comment Share on other sites More sharing options...
nategarrett Posted May 23, 2014 Report Share Posted May 23, 2014 100% uptime is impossible as hardware errors are simply part of the game and inevitable. What I love about you guys is the communication and how quickly you get things back up and running! Makes it a lot easier to let my clients know what's going on. Thank you!! 1 Quote Link to comment Share on other sites More sharing options...
cziv Posted May 23, 2014 Report Share Posted May 23, 2014 The server is back online and is working on catching up due to the flood of requests. I did disable SSD caching so it will take a little longer than normal but it should be back to normal in ~15 minutes. I think the databases are offline, since i get "error establishing connection". Quote Link to comment Share on other sites More sharing options...
gplurn Posted May 23, 2014 Report Share Posted May 23, 2014 What I love about you guys is the communication and how quickly you get things back up and running! Makes it a lot easier to let my clients know what's going on. Thank you!! seconded. Honesty saves the day, every time. <3 Quote Link to comment Share on other sites More sharing options...
cbonde Posted May 23, 2014 Report Share Posted May 23, 2014 Just realized this was the same date, my time needed to be set on the forum since I just joined. Is there no back up plan when servers go down? Quote Link to comment Share on other sites More sharing options...
linkiepoo Posted May 23, 2014 Report Share Posted May 23, 2014 Thank Mike It's nice how when the server down I can come to the forums and found out straight away what have happen. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 When is echo forecast to be back up? Is there no back up plan when servers go down at all?It is back online now but it will take time for it to stabilize. Due to the SSD Caching being the cause of the corruption we had to disable it. That said, can you elaborate as to what you expect when you ask about a 'back up plan'? We do have backups of all servers but restoring the backups isn't something that would be instantaneous. Quote Link to comment Share on other sites More sharing options...
juliebolddogge Posted May 23, 2014 Report Share Posted May 23, 2014 Thanks, Mike! I noticed an issue about three minutes before your post... came right here to the forum and there was the update. Like others, I really appreciate the openness. Thank you! Quote Link to comment Share on other sites More sharing options...
cbonde Posted May 23, 2014 Report Share Posted May 23, 2014 At my business we have redundancy built in to get the content back up on another server when something like this happens. Was wanting to know if I am going to have to wait any time anything goes down until you fix whatever was down, be it drives, power supply or whatever the problem might be. I am new here and was wondering. If I let one of my servers at work stay down for an hour I would be looking for a new job. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 At my business we have redundancy built in to get the content back up on another server when something like this happens. Was wanting to know if I am going to have to wait any time anything goes down until you fix whatever was down, be it drives, power supply or whatever the problem might be. I am new here and was wondering. If I let one of my servers at work stay down for an hour I would be looking for a new job.There's no way we could provide such redundancy for a whole server at the price point of $7.50/month - such redundancy costs on the order of a couple hundred dollars per month [for a single account/user/site/need]. We do have spare drives and spare power supplies, mother boards, etc but the problem is that when the file system becomes corrupted it's not as simple as just swapping a drive - a file system check is required and there's no way around that. It has to scan the whole storage system and look for repairs and fix them. The file system being scanned and repaired has to be offline during this time so while the server is technically 'on' - it's not serving requests. To have the sort of rendancy you seem to be expecting would require us to literally double our infrastructure and to have half of that infrastructure sitting idle and then we'd also have to have the automation in place to ensure the backup hardware [online, doing nothing] was staying 100% in-sync with the active hardware. At the price point of regular shared hosting we simply can't provide high availability hosting in the sense that you seem to be expecting - there are services out there that do offer this but the're on the order of $100/mo+ for a single user/site/account. I think your expectations and reality are simply a little out of sync in this case but let me know if I can clarify further. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 The facility is pulling both solid state drives used for caching and is swapping in a known-good drive for us so we can re-enable the caching. This will not incur any downtime. Once the new drive is installed we will re-enable caching and server performance should return to normal or very near it. We will be testing both removed drives to find the failure and then replacing it with a new drive and restoring the original caching configuration. This should all be transparent beyond once the caching is re-enabled the server will be faster/quicker to respond. Quote Link to comment Share on other sites More sharing options...
cbonde Posted May 23, 2014 Report Share Posted May 23, 2014 As I said I am new to this whole thing and decided to try this on a shared with the better shared plain. So do any of your plans have redundancy built in with expectations of getting a site back up with an hour or so? And I am paying 46.50 per quarter not 7.50- a month---- I guess I should have asked more questions when I signed up but you had glowing reviews. I didnt expect to have it go down so soon into joining. Quote Link to comment Share on other sites More sharing options...
cbonde Posted May 23, 2014 Report Share Posted May 23, 2014 Ok finally able to log in now. Thanks! Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 As I said I am new to this whole thing and decided to try this on a shared with the better shared plain.Sure - I'm not trying to be condescending but just trying to be transparent/straightforward so that you know exactly what to expect. So do any of your plans have redundancy built in with expectations of getting a site back up with an hour or so?The server is back online, and has been for a little while now. It was brought back online as quickly as we could - we obviously don't want things offline any more than any one of our customers does [in short - not at all]. And I am paying 46.50 per quarter not 7.50- a month----Even that is nowhere near the cost of high availability hosting for critical sites. If you really do need high availability with load-balancing [i.e. dual hardware to handle requests if one piece of hardware fails] I'd suggest looking into it - most cannot or choose not to afford it. I guess I should have asked more questions when I signed up but you had glowing reviews.Hardware failure can happen to any provider - even High Availability services can and do have downtime. Google has been down [and they have numerous data center facilities and tons of redundancy], Amazon AWS has been down for days at a time, CloudFlare has had downtime - it happens. We can't ever promise there won't be issues but what we do promise is that we'll get them resolved as quickly as possible and will keep you informed/updated along the way as to the progress, what happened, why it happened, and what we're doing about it [i.e. what we're doing in this thread]. I didnt expect to have it go down so soon into joining.Nobody "expects" downtime but it's a fact of life - have an online presence long enough and you're bound to experience some. By comparison we have customers that have been on the server you're on for well over a year without any downtime - the difference between them and you is simply that they've been with us longer. If you have any other questions or concerns at all just let us know - we're happy to answer any and all of your questions. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 23, 2014 Author Report Share Posted May 23, 2014 The server is back to 'normal' or very near it at this point. We are swapping in spare solid state drives now to re-enable caching. Quote Link to comment Share on other sites More sharing options...
gplurn Posted May 24, 2014 Report Share Posted May 24, 2014 I'm having trouble connecting to sites again, any developments? Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 24, 2014 Author Report Share Posted May 24, 2014 The server is online and 100% operational - do please open a support ticket. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted May 24, 2014 Author Report Share Posted May 24, 2014 SSD Caching is re-enabled but it will take some time for the cache to rebuild/refill. We will be performing extensive testing on the pulled SSDs early next week and will be replacing the failed drive with a good one. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.