Jump to content


Photo

Network connectivity issues. 08/12/2019

network service issues RFO

  • Please log in to reply
7 replies to this topic

#1 Tim

Tim

    Member

  • Staff
  • PipPip
  • 70 posts
  • Gender:Male

Posted 12 August 2019 - 01:16 PM

Hello, 

 

Our data center facility is experiencing networking issues. It is out of our direct control as all of our equipment is up. They are aware of the issue and are actively working to resolve it. Once they provide details about the outage we will let everyone know. This included our support system and forums so I apologize for any delay in this reply. 

 

Once I have more details and the RFO (Reason For Outage) I will post them here. 


  • 0

#2 Tim

Tim

    Member

  • Staff
  • PipPip
  • 70 posts
  • Gender:Male

Posted 12 August 2019 - 01:18 PM

As of the time of this post I am able to access our services and your websites again. 


  • 0

#3 SarisIsop

SarisIsop

    Advancing Member

  • Members
  • PipPipPip
  • 154 posts
  • Gender:Not Telling

Posted 12 August 2019 - 01:19 PM

My sites are back on-line.

 

Thank you.


  • 0

#4 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,888 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 12 August 2019 - 01:52 PM

It is my current understanding that this issue was due to an unusual hardware failure in a core piece of networking equipment at the facility.  This piece of hardware failed in such a way that it wasn't servicing requests but wasn't 'offline' - sort of like an operating system crash/panic.

 

As this piece of equipment is redundant - there is another identical piece of hardware doing the same job that should pick up the slack - I do not at this point know why the failure caused an outage that redundancy didn't prevent.  It could be due to the nature of the failure in that the gear stayed online but wasn't actually working but that's speculation on my part.

 

As it stands everything is back online but we have lost the redundancy of this core piece of hardware until the issue is fully resolved.  It is suspected that this is a bug in the operating system running on the core networking equipment and the facility is working with Juniper Emergency Support to both investigate the cause of the issue as well as working to ensure it doesn't happen again.

 

Here is a snippet of the kernel/operating system log from the failed piece of networking hardware:

Aug 12 17:29:05  dist3.denver2 /kernel: BAD_PAGE_FAULT: pid 1972 (fxpc), uid 0: pc 0x0 got a read fault at 0x0, x86 fault flags = 0x4
Aug 12 17:29:05  dist3.denver2 /kernel: Trapframe Register Dump:
Aug 12 17:29:05  dist3.denver2 /kernel: eax: 20dba498ecx: 000000ffedx: 20dba494ebx: 20dba468
Aug 12 17:29:05  dist3.denver2 /kernel: esp: af97de6cebp: af97de98esi: 21054b98edi: 00000000
Aug 12 17:29:05  dist3.denver2 /kernel: eip: 00000000eflags: 00010202
Aug 12 17:29:05  dist3.denver2 /kernel: cs: 0033ss: 003bds: 003bes: 003b
Aug 12 17:29:05  dist3.denver2 /kernel: fs: b0b5003btrapno: 0000000cerr: 00000004
Aug 12 17:29:05  dist3.denver2 /kernel: PC address 0x0 is inaccessible, PDE = 0x0, ****** = 0x0
Aug 12 17:29:05  dist3.denver2 /kernel: BAD_PAGE_FAULT: pid 1972 (fxpc), uid 0: pc 0x0 got a read fault at 0x0, x86 fault flags = 0x4
Aug 12 17:29:05  dist3.denver2 /kernel: Trapframe Register Dump:
Aug 12 17:29:05  dist3.denver2 /kernel: eax: 20dba498ecx: 000000ffedx: 20dba494ebx: 20dba468
Aug 12 17:29:05  dist3.denver2 /kernel: esp: af97de6cebp: af97de98esi: 21054b98edi: 00000000
Aug 12 17:29:05  dist3.denver2 /kernel: eip: 00000000eflags: 00010202
Aug 12 17:29:05  dist3.denver2 /kernel: cs: 0033ss: 003bds: 003bes: 003b
Aug 12 17:29:05  dist3.denver2 /kernel: fs: b0b5003btrapno: 0000000cerr: 00000004
Aug 12 17:29:05  dist3.denver2 /kernel: PC address 0x0 is inaccessible, PDE = 0x0, ****** = 0x0

Once the Reason For Outage [RFO] is available from our facility we will make it available.


  • 1
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#5 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,888 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 12 August 2019 - 04:54 PM

In speaking with the senior network engineer at Handy Networks, our upstream facility, this issue affected both redundant pieces of hardware responsible for routing traffic.  The primary crashed and then the secondary took over and subsequently crashed.  While they were working to determine the cause this was ongoing and explains why things would show as online for a minute or two and then back down.

 

The mode of failure is definitely unusual and I still personally believe it to be a bug in the Juniper OS.

 

Juniper as well as Handy Networks are still working to trace the cause and I expect to have an RFO within 72 hours or less.


  • 1
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#6 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,888 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 13 August 2019 - 12:34 PM

Our upstream facility has scheduled a maintenance window tonight from 11 PM to 4 AM Eastern Time.

 

They expect we may see a couple instances of downtime of up to 15 minutes but are going to strive to keep any downtime to a minimum.

 

For full details you can read their status at https://helpdesk.han...wsItem/View/276


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#7 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,888 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 15 August 2019 - 08:39 AM

I was waiting on the RFO before updating this - but I haven't seen one yet so I at least wanted to post that the maintenance on the 13th went well and that we are fully redundant once again.

 

Juniper is still investigating the cause but from my conversations with the networking department at our upstream provider a filter has been put in place that should prevent the issue from recurring.

 

Once I have the RFO I will make it available.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#8 Tim

Tim

    Member

  • Staff
  • PipPip
  • 70 posts
  • Gender:Male

Posted Today, 09:26 AM

Hello all,

 

Here are further details about the issue from our data provider. 

https://helpdesk.han.../network-status


  • 0





2 user(s) are reading this topic

0 members, 2 guests, 0 anonymous users