Michael D. Posted August 13, 2010 Report Share Posted August 13, 2010 Hello, We're going to be rebooting the Boreas server momentarily and the Atlantis server later this evening (approximately 10 PM EST GMT-5). Each server should be down for no longer than 10 minutes. We're booting into a newer kernel to fix a Kernel Routing Table issue that is causing additional latency and additional CPU strain on the VPS nodes. The MDDHosting.com front end is on the Boreas server and as such it will be offline for approximately 10 minutes. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 13, 2010 Author Report Share Posted August 13, 2010 The Boreas server has been rebooted with no issues. The Atlantis server is still scheduled for 10PM EST GMT-5. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 14, 2010 Author Report Share Posted August 14, 2010 Atlantis is currently rebooting, we expect total downtime to be 5 to 10 minutes. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 14, 2010 Author Report Share Posted August 14, 2010 The server is coming back online and due to the nature of a VPS node it's going to take a few minutes for each of the VPS on the node to boot back up. Once this is completed we'll close out this maintenance window. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 14, 2010 Author Report Share Posted August 14, 2010 We've tested all of the VPS on the node and have verified they're online and operating normally. Sorry for the delay in posting this update This maintenance window is now officially closed. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 16, 2010 Author Report Share Posted August 16, 2010 We're still experiencing some Kernel Route Table issues with the Boreas server and we're working with our software vendors to resolve the issues. We had to reboot the node just now as the routing issues were causing timeouts and extreme latency. We've booted into a newer suggested Kernel by our software vendor and we're actively monitoring the server at this point. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 17, 2010 Author Report Share Posted August 17, 2010 We're rebooting the node for hopefully a last time. After working with our software vendors it looks as though the server is simply seeing so much traffic that the default networking settings in the kernel for TCP traffic and the default buffers were too small/weak. We've adjusted these settings quite a bit to allow the server to cache more routes and to more aggressively clear expired routes and we've optimized the TCP subsystem quite a bit. Once this reboot is completed we're going to monitor the server closely for 72 hours. If any further issues are going to occur, they will happen within the first 72 hours however we don't anticipate any further issues. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 17, 2010 Author Report Share Posted August 17, 2010 The reboot has been completed and we're going to actively monitor the server for a minimum of 72 hours. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 17, 2010 Author Report Share Posted August 17, 2010 All was well until about an hour ago when the same issue re-occurred and we're going to attempt to resolve the issue without a reboot. If we are forced to do a reboot we'll be scheduling it for later this evening during an off-peak time. If you have any questions feel free to respond to this thread or to open a support ticket. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted August 17, 2010 Author Report Share Posted August 17, 2010 We're scheduling a reboot for the Boreas server only that will take approximately 5 minutes for 10 PM EST (GMT-5) to boot into an updated Kernel. The process is similar in that we're going to closely monitor the server for 72 hours after this reboot to ensure the original root cause issue has been resolved. If you have any questions feel free to respond to this thread or to open a support ticket. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.