Michael D. Posted February 24, 2016 Report Share Posted February 24, 2016 First and foremost I want to apologize for the unreliability of our network as of late. While we do own and operate all of our servers/switches we do not run the entire network at the facility and do rely upon them to provide us the connectivity we pay them for. We have been with Handy Networks in Denver, Colorado for many years now. Until recently the network has always performed very well. There have been instances where a huge DDoS attack would take us offline and in a few cases they were large enough to affect the whole facility but by and large things over the years were stable. As of lately things have been less stable and while this has affected you it has affected us as well. On 02/12/16 there was a fiber cut affecting the transport provider, Level3, between our two locations. Unfortunately our new equipment was on the side of the transport that does not have direct internet access [it passes over the transport before leaving to the internet]. While this transport is supposed to be physically diverse across two routes - it is clear that something is wrong when a single fiber cut takes both routes down. Handy Networks is still working with Level3 on this matter to determine why a single fiber cut took down both allegedly diverse routes but I am not sure if we'll ever have a real answer from L3 on that. Handy Networks is in the process of installing a secondary transport provider with a diverse physical link and network but such things take time - at this point I'm hoping it will be available within 4 weeks. Understand this isn't anything within our control and is entirely up to our facility to handle. On 02/24/16 we lost all network connectivity including connectivity to HandyNetworks.com itself from 1:04 PM to about 1:17 PM. We experienced another outage from 3:08 PM to about 3:16 PM. At this point I do not have an official RFO or any details beyond that it was a networking issue at our facility outside of our control. As soon as I have further details I will make them available. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted February 24, 2016 Author Report Share Posted February 24, 2016 There will be scheduled networking interruptions this evening to address this - more information directly from our upstream provider, Handy Networks [Times are Mountain Time] Emergency Network Maintenance Windows: Feb 24, 2016 @ 9:00PM - 1:00AMWe will be conducting emergency network maintenance this evening from February 24 @ 9:00PM - 1:00AM to address the underlying condition that has caused the two periods of packet loss and latency that were experienced earlier today. During this time, you can expect to have several other periods of packet loss and latency.Unfortunately this is entirely outside of our control. I will do my best to get the details of what needs to be changed and why as well as what the actual cause of the issues is but I do not have that information to provide at this time. Quote Link to comment Share on other sites More sharing options...
SarisIsop Posted February 24, 2016 Report Share Posted February 24, 2016 Thank you for keeping us informed and your honesty about what is going on. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted February 24, 2016 Author Report Share Posted February 24, 2016 Absolutely - I am sorry that I even had to write this post and that our customers have experienced issues. I'm doing everything within my power to ensure things remain stable moving forward after this maintenance window but ultimately we do rely on our providers just like our customers rely on us. Quote Link to comment Share on other sites More sharing options...
mcfrye Posted February 24, 2016 Report Share Posted February 24, 2016 What time zone are the times for the emergency maintenance? Quote Link to comment Share on other sites More sharing options...
Michael D. Posted February 24, 2016 Author Report Share Posted February 24, 2016 Mountain Time - it will be 11 PM to 3 AM ET. Quote Link to comment Share on other sites More sharing options...
Rhody401 Posted February 25, 2016 Report Share Posted February 25, 2016 Thanks for sharing the info. This is a minor inconvenience at a good time of day, and it's good that they are addressing the issue instead of ignoring it. I have only been a customer for 5 days, but so far I am very impressed with the stellar customer service. Quote Link to comment Share on other sites More sharing options...
AMC4x4 Posted February 25, 2016 Report Share Posted February 25, 2016 It's this kind of transparency and accountability that keeps me here, guys. Keep doing what you're doing. I know I'm not on one of your more expensive plans, but you've always treated me as a valued customer, and I really appreciate it. Just wanted to say thanks. Quote Link to comment Share on other sites More sharing options...
Vilandra Posted February 25, 2016 Report Share Posted February 25, 2016 I can't tell you how much I appreciate you keeping us informed like this. Thank you for all you do! Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 I am adding the outage for February 2/26/2016 to this thread.At this time our datacenter is investigating switch issues at the new location. When I have further updates I will update this thread. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 I am also looking into the secondary faults that are occurring on the servers where they are pingable but unable to display web pages. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 I have located the cause of the current issues. The network failure in the datacenter has included our connection to the SAN that we are using for our high speed storage. I am awaiting updates from the datacenter. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 They are still working on the core switch at the location. I will update as soon as I can. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 We have isolated the fault and are working to reolve the issue with the SAN> I am not able to provide a ETA as this is not a scheduled or planned fault. When I can provide a ETA I will gladly provide one on this thread. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 I want to put out that tentatively all servers are up. I am standing by for a cause of the failure of the SAN links. Quote Link to comment Share on other sites More sharing options...
SarisIsop Posted February 26, 2016 Report Share Posted February 26, 2016 I'm back online. Quote Link to comment Share on other sites More sharing options...
SarisIsop Posted February 26, 2016 Report Share Posted February 26, 2016 I'm back online. Quote Link to comment Share on other sites More sharing options...
Tindell Posted February 26, 2016 Report Share Posted February 26, 2016 I'm also back online. Thank you for the continued updates. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 We may need to reboot some or all of the servers to repair the underlying filesystem due to damage caused by the outage. We will update this thread prior to doing so. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 S3 and S4 need emergency work to repair the filesystems so they can function. I am working on S3 right now. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 S3's fsck is running. I had some unexpected issues getting the server to boot back up. it should be completed within 10 minutes.I am rebooting s4 to start it's fsck. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 S3 is already done and booting. Quote Link to comment Share on other sites More sharing options...
ericr Posted February 26, 2016 Report Share Posted February 26, 2016 S4 is online. I will ask that if you have issues at this point to please open a ticket so that we can investigate. Quote Link to comment Share on other sites More sharing options...
Laimonas Posted February 26, 2016 Report Share Posted February 26, 2016 I would also appreciate info on S1 today's outage reasons for 2 hours. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted February 26, 2016 Author Report Share Posted February 26, 2016 I would also appreciate info on S1 today's outage reasons for 2 hours.The network disruption to our storage platform is the cause for the outage on S1 I believe. We're currently working with our storage vendor as well as our facility to determine exactly how this happened so we can prevent it. We are using a fully dual redundant network where any part of it can fail and everything remains online. This failure was outside of our network border, however, it still caused connectivity issues and that's why we're still investigating. Once I know why it happened I will post it here. I do not, as of yet, have that diagnosis. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.