Jump to content
MDDHosting Forums

Connectivity issues over Telia and Hurricane Electric [Transit Providers]


Recommended Posts

We were alerted within about a minute of connectivity issues between our network and Europe.

 

Upon investigation we found that a transit provider, Telia, is having issues.

 

We've removed Telia from our bandwidth mix and have reached out to them concerning the matter. It will take a few minutes for things to stabilize but if you continue having issues do please open a ticket and provide a traceroute to your site as well as your IP address [ https://www.mddhosting.com/whatismyip.php/ https://www.google.com/#safe=off&q=what+is+my+ip ].

 

We apologize for the trouble this has caused you and hopefully we'll get a cause analysis from Telia or at least some details as to why it happened.

Link to comment
Share on other sites

We determined that both Telia and Hurricane Electric are experiencing major issues.

 

While you may not be able to reach us/your site - the majority of the word can.

 

Our networking department is in contact with both transit providers.

Link to comment
Share on other sites

The long and short of it is that there was a major outage with a major internet backbone provider that caused connectivity issues for some customers. This failure was well outside of our border [in New York, apparently] and well outside of our direct control.

 

Our networking department did reach out to Telia and HE and you will see below quoted the information we received from our networking department concerning the matter.

We have experienced a loss of connectivity due to a major outage with our carrier Telia, which they have confirmed. The issue was amplified by the fact that another provider, HE, also uses Telia. On our request after they received confirmation from Telia, HE de-peered with Telia until the issue is resolved.

 

We received these responses from Telia:

 

"Dear Customer,

 

This is part of a larger outage that is currently being investigated.

 

Kind Regards,

Luis Nuñez

Customer Care, Data & Infrastructure"

 

Then:

"We are currently experiencing issue with a backbone router in New York."

Link to comment
Share on other sites

We have recieved an update from Telia:

Dear Customer,

 

Our core team has resolved an issue on our backbone causing U.S. customers packet loss. The root cause analysis will follow and we expect no further packet loss due to this issue.

We do not expect there to be further issues and are now marking this as resolved.

Link to comment
Share on other sites

Telia released their official RFO [Reason For Outage] and here are those details:

Dear Customer,

 

This is a Reason for Outage Report with details regarding the case you have opened with TeliaSonera International Carrier.

 

Country: United States

TeliaSonera Case Reference: 00563796

Network Impact: Packet loss on the Telia Carrier U.S. backbone.

Case Opened: 6/9/2016 7:00 PM (After the issue had begun)

Case Ready for Service: 6/9/2016 7:23 PM

 

Reason for Outage: Incorrect ISIS metric and multiple commits while turning up new Telia Carrier backbone links in Dallas caused a loop of reconverging BGP and ISIS protocols. This put a very high CPU load on our U.S. routers and caused some trans-Atlantic congestion.

Actions Taken: The nyk-bb1 inner-core router (New York) was the first router to show real problems when we received alarms indicating packet loss on transit-Atlantic traffic together with high CPU utilization. The router was taken out of service. Further investigation revealed that the root cause was too many commits by Implementation while turning up new backbone links on dls-b22 (Dallas), along with an incorrect metric. The configuration in dls-b22 was rolled back to alleviate the problem and nyk-bb1 has been put back in service. This resolution is permanent and the will be no further loss related to this issue

Additional Information: Telia Implementation team is making significant changes to their way of working to mitigate this from happening in the future.

 

 

Please note that all the time stamps given above are in UTC unless otherwise stated.

Please bear in mind that this was a major issue with the internet itself and one of it's larger backbone providers. This was not within our power to detect, prevent, or resolve.

 

We do apologize for any trouble this outage caused you.

  • Upvote 1
Link to comment
Share on other sites

  • 2 weeks later...

CloudFlare has an in-detail blog post on the issues with Telia Carrier - the ones that affected us and our customers as well.

 

We found it very interesting so I'm linking to it here:

https://blog.cloudflare.com/a-post-mortem-on-this-mornings-incident/

 

It seems it was human error at Telia that caused these issues...

http://www.theregister.co.uk/2016/06/20/telia_engineer_blamed_massive_net_outage/

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...