Michael D. Posted June 9, 2016 Report Share Posted June 9, 2016 We were alerted within about a minute of connectivity issues between our network and Europe. Upon investigation we found that a transit provider, Telia, is having issues. We've removed Telia from our bandwidth mix and have reached out to them concerning the matter. It will take a few minutes for things to stabilize but if you continue having issues do please open a ticket and provide a traceroute to your site as well as your IP address [ https://www.mddhosting.com/whatismyip.php/ https://www.google.com/#safe=off&q=what+is+my+ip ]. We apologize for the trouble this has caused you and hopefully we'll get a cause analysis from Telia or at least some details as to why it happened. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted June 9, 2016 Author Report Share Posted June 9, 2016 We determined that both Telia and Hurricane Electric are experiencing major issues. While you may not be able to reach us/your site - the majority of the word can. Our networking department is in contact with both transit providers. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted June 9, 2016 Author Report Share Posted June 9, 2016 The long and short of it is that there was a major outage with a major internet backbone provider that caused connectivity issues for some customers. This failure was well outside of our border [in New York, apparently] and well outside of our direct control. Our networking department did reach out to Telia and HE and you will see below quoted the information we received from our networking department concerning the matter. We have experienced a loss of connectivity due to a major outage with our carrier Telia, which they have confirmed. The issue was amplified by the fact that another provider, HE, also uses Telia. On our request after they received confirmation from Telia, HE de-peered with Telia until the issue is resolved. We received these responses from Telia: "Dear Customer, This is part of a larger outage that is currently being investigated. Kind Regards, Luis Nuñez Customer Care, Data & Infrastructure" Then: "We are currently experiencing issue with a backbone router in New York." Quote Link to comment Share on other sites More sharing options...
Michael D. Posted June 9, 2016 Author Report Share Posted June 9, 2016 We have recieved an update from Telia: Dear Customer, Our core team has resolved an issue on our backbone causing U.S. customers packet loss. The root cause analysis will follow and we expect no further packet loss due to this issue. We do not expect there to be further issues and are now marking this as resolved. Quote Link to comment Share on other sites More sharing options...
Michael D. Posted June 10, 2016 Author Report Share Posted June 10, 2016 Telia released their official RFO [Reason For Outage] and here are those details: Dear Customer, This is a Reason for Outage Report with details regarding the case you have opened with TeliaSonera International Carrier. Country: United States TeliaSonera Case Reference: 00563796 Network Impact: Packet loss on the Telia Carrier U.S. backbone. Case Opened: 6/9/2016 7:00 PM (After the issue had begun) Case Ready for Service: 6/9/2016 7:23 PM Reason for Outage: Incorrect ISIS metric and multiple commits while turning up new Telia Carrier backbone links in Dallas caused a loop of reconverging BGP and ISIS protocols. This put a very high CPU load on our U.S. routers and caused some trans-Atlantic congestion. Actions Taken: The nyk-bb1 inner-core router (New York) was the first router to show real problems when we received alarms indicating packet loss on transit-Atlantic traffic together with high CPU utilization. The router was taken out of service. Further investigation revealed that the root cause was too many commits by Implementation while turning up new backbone links on dls-b22 (Dallas), along with an incorrect metric. The configuration in dls-b22 was rolled back to alleviate the problem and nyk-bb1 has been put back in service. This resolution is permanent and the will be no further loss related to this issue Additional Information: Telia Implementation team is making significant changes to their way of working to mitigate this from happening in the future. Please note that all the time stamps given above are in UTC unless otherwise stated.Please bear in mind that this was a major issue with the internet itself and one of it's larger backbone providers. This was not within our power to detect, prevent, or resolve. We do apologize for any trouble this outage caused you. 1 Quote Link to comment Share on other sites More sharing options...
Michael D. Posted June 21, 2016 Author Report Share Posted June 21, 2016 CloudFlare has an in-detail blog post on the issues with Telia Carrier - the ones that affected us and our customers as well. We found it very interesting so I'm linking to it here: https://blog.cloudflare.com/a-post-mortem-on-this-mornings-incident/ It seems it was human error at Telia that caused these issues... http://www.theregister.co.uk/2016/06/20/telia_engineer_blamed_massive_net_outage/ Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.