Jump to content
MDDHosting Forums

[Resolved] Cypress Unexpected Downtime


Recommended Posts

The server was rebooted within seconds of the alerts from our internal monitoring and took about 8 minutes to get back up to speed and we're still investigating the cause for the outage and will update you.

The network graph is particularly interesting:

http://www.mddhosting.com/support/serverstatus.php

http://www.screen-shot.net/2011-03-11_1617.png

Link to comment
Share on other sites

The network graph is particularly interesting:

http://www.mddhosting.com/support/serverstatus.php

http://www.screen-shot.net/2011-03-11_1617.png

It's not entirely accurate at all times and is only there just for those who are curious and shouldn't be used to diagnose issues :) Just see this graph of our network for example over the same period of time:

http://www.screen-shot.net/2011-03-11_1616.png

We're working on making these nicer more-accurate graphs available on the public side of things, but I can't promise when or if that will happen.

 

As far as the crash - the server went from having around 8 GB of RAM free (which is a lot, more than a lot of providers have total in their servers) to 0 and the server started killing processes to get some free ram back. This issue happend so quickly that the logging stopped before anything useful could be written to the disks to diagnose this and we were forced to perform a reboot.

 

We're setting up some additional internal monitoring on a very fast interval (something like 5 seconds) for the next 48 hours so if it happens again we will have some useful information to diagnose what happened.

 

At this point the server is back online and we're going to mark this as closed. If you have any questions at all, do feel free to ask them.

Link to comment
Share on other sites

It's not entirely accurate at all times and is only there just for those who are curious and shouldn't be used to diagnose issues :) Just see this graph of our network for example over the same period of time:

<< Edited out image, scroll up to see it in the quoted post >>

We're working on making these nicer more-accurate graphs available on the public side of things, but I can't promise when or if that will happen.

 

As far as the crash - the server went from having around 8 GB of RAM free (which is a lot, more than a lot of providers have total in their servers) to 0 and the server started killing processes to get some free ram back. This issue happend so quickly that the logging stopped before anything useful could be written to the disks to diagnose this and we were forced to perform a reboot.

 

We're setting up some additional internal monitoring on a very fast interval (something like 5 seconds) for the next 48 hours so if it happens again we will have some useful information to diagnose what happened.

 

At this point the server is back online and we're going to mark this as closed. If you have any questions at all, do feel free to ask them.

 

Hmm.. maybe cacti.mddhosting.com and cypress.mddhosting.com are the same box? If logging stopped, that would explain why the the graph shows some dead air during the outage.

 

Thanks for digging into this one Michael.

Link to comment
Share on other sites

Hmm.. maybe cacti.mddhosting.com and cypress.mddhosting.com are the same box? If logging stopped, that would explain why the the graph shows some dead air during the outage.

 

Thanks for digging into this one Michael.

No, they're not on the same server - Cacti just seems to not be very reliable as we have it configured. Both Cacti and the graph I included are hitting the switch for details on the traffic so the information source is the same.

 

It's not used for us, it's just to display a nice graph for the customers for those who want to see the traffic just out of curiosity. We may pull the graph off of that page until we can get a more reliable solution in place.

Link to comment
Share on other sites

 Share

×
×
  • Create New...