Jump to content


Photo

R1, S1, P1, SpamExperts Scanners outage


  • Please log in to reply
27 replies to this topic

#1 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 02:19 AM

I have identified a outage affecting all servers in the new datacenter.  I have escalated to our datacenter provider and will update as soon as possible.


  • 0

#2 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 02:21 AM

The datacenter has confirmed that hey are aware of the outage and are working to resolve the issues.


  • 0

#3 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 02:28 AM

I have adjusted the scope to include the inbound and outbound scanners as they are also in this facility.


  • 0

#4 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 02:47 AM

The datacenter is in contact with the transit provider to identify any network connectivity problems. They are also on-site reviewing the network gear for issues.


  • 0

#5 frankacter

frankacter

    Member

  • Clients
  • PipPip
  • 46 posts
  • Gender:Male

Posted 12 February 2016 - 02:52 AM

FYI, the link to the public report for P1 on the status page is linking to the MDD support page instead of the Pingdom URL.

Also, none of the speedtest links work for any of the servers.


  • 0

#6 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 02:58 AM

We are aware of the speed test link issues.  I will investigate the P1 reporting issue at later point.


  • 0

#7 AMGill

AMGill

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 12 February 2016 - 03:10 AM

An hour down....any news on an ETA.  Bummer.


  • 0

#8 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 03:23 AM

I do not have a ETA at this time.  I am standing by while the datacenter is working as fast as they can to locate and fix the fault.


  • 0

#9 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 03:53 AM

The datacenter is not able to provide any further updates at this time.  I am continuing to persist and try to get information from them.


  • 0

#10 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 03:58 AM

I have been provided more information.  The fault is located in Level 3's network in denver. Level 3 is working to correct the fault at this time.


  • 0

#11 AMGill

AMGill

    Newbie

  • Members
  • Pip
  • 2 posts

Posted 12 February 2016 - 04:00 AM

Wow this is a long one.  i sure hope we dont have to change servers again....dont think I could handle another move.

 

Thanks for the updates as I have clients waiting for answers. I know you are doing what you can


  • 0

#12 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 04:02 AM

We are showing the link backup.  I will continue to monitor to ensure the link is stable.


  • 0

#13 andamira

andamira

    Newbie

  • Members
  • Pip
  • 5 posts
  • Gender:Male

Posted 12 February 2016 - 04:04 AM

It looks like pages in R1 are reachable again (for now). Thank you for the updates.


  • 0

#14 Vask

Vask

    Newbie

  • Members
  • Pip
  • 1 posts

Posted 12 February 2016 - 04:06 AM

So, for two hours the problem was on another datacenter that initially was thought?

 

A simple tracert was showing where the network problem was:

 

Tracing route to ************* [173.248.188.176]
over a maximum of 30 hops:
 
  1    <1 ms     1 ms     1 ms  10.0.0.1
  2     1 ms     1 ms     1 ms  192.168.1.1
  3    45 ms    39 ms    39 ms  80.107.108.110
  4   599 ms    52 ms    53 ms  athe-crsb-hera-gsra-1.backbone.otenet.net [79.128.224.217]
  5    77 ms    53 ms    47 ms  ten0-1-0-0-crs01.ath.oteglobe.gr [62.75.3.1]
  6    91 ms    91 ms    91 ms  62.75.4.162
  7   320 ms    96 ms   139 ms  40ge1-3.core1.lon2.he.net [195.66.224.21]
  8     *      177 ms   158 ms  100ge1-1.core1.nyc4.he.net [72.52.92.166]
  9   182 ms   176 ms   186 ms  100ge7-2.core1.chi1.he.net [184.105.223.161]
 10   250 ms   208 ms   199 ms  10ge15-2.core1.den1.he.net [184.105.81.82]
 11   199 ms   198 ms     *     handy-networks-llc.gigabitethernet2-11.core1.den1.he.net [216.66.78.126]
 12     *        *        *     Request timed out.
 13     *        *        *     Request timed out.

  • 0

#15 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 04:08 AM

And they are back offline.  I am notifying the datacenter.


  • 0

#16 kix766

kix766

    Newbie

  • Members
  • Pip
  • 12 posts

Posted 12 February 2016 - 04:10 AM

Wow this is a long one.  i sure hope we dont have to change servers again....dont think I could handle another move.

 

Thanks for the updates as I have clients waiting for answers. I know you are doing what you can

 

my thoughts exactly


  • 0

#17 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 04:11 AM

The fault was with the second DC location.  They datacenter has links from the main datacenter to the second datacenter as well as direct internet links.  The current reported fault is that the issue was with level 3 and the connection to that data center in denver.


  • 0

#18 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,900 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 12 February 2016 - 04:14 AM

Wow this is a long one.  i sure hope we dont have to change servers again....dont think I could handle another move.
 
Thanks for the updates as I have clients waiting for answers. I know you are doing what you can

The migrations were to move us to the new hardware/infrastructure and then some secondary migrations to move from CentOS7 to CentOS6.  Everybody is on CentOS6 now and there is nothing wrong with our new hardware, servers, network.  The issue is outside of our border and we have zero control over it.
 
At this point we're at the mercy of our facility and they are at the mercy of their transit providers.  This isn't affecting just us - it's affecting everybody in the facility which is tens of thousands if not hundreds of thousands of users - us included.
 
I do apologize for this outage and will most certainly pass on the Reason For Outage, or RFO, once it is available from our upstream provider.
 
Here are the status updates from them [not very descriptive but will give you an idea of the level of information we've had available to us]:
 

Update - 02:06AM MDT: 
Connectivity to our DTC location has been restored, but we are still working with Level3 to ensure that the problem has been completely resolved. 
==========
Update  - 01:50AM MDT:
Level3 has identified an issue in the Denver Metro area, and is working to resolve it. We will continue to provide updates as we receive them. 
==========
Update - 12:41AM MDT:
We are in contact with our transit provider to identify any network connectivty problems. We are also on-site reviewing our network gear for issues.
==========
We have been alerted of connectivity issues at our Denver Tech Center location. We are working on the issue as quickly as possible and will update here.

We are 2 hours ahead of MDT.
  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/

#19 ericr

ericr

    Staff

  • Staff Administrator
  • PipPipPip
  • 224 posts
  • Gender:Male

Posted 12 February 2016 - 04:14 AM

The current status of the work is that connectivity to the DTC location has been restored, but we they are still working with Level3 to ensure that the problem has been completely resolved. 


  • 0

#20 MikeDVB

MikeDVB

    Forum Administrator

  • Staff Administrator
  • PipPipPipPipPip
  • 2,900 posts
  • Gender:Male
  • Location:Central Indiana, USA

Posted 12 February 2016 - 04:23 AM

That said the network is online and operational and has been prior to my last update of this thread.

 

None of our networking gear or servers had any issues.  The best analogy I can make is that there was an accident on the highway between us and the internet - on a portion of the road that is not within our control.  This stopped traffic from entering/leaving until the issue was resolved.

 

That said I am certainly going to get with the facility concerning this as *one* provider out of several dropping/having issues should not result in a total lack of connectivity.  This does defeat the whole purpose of having multiple transit providers available to us.


  • 0
Michael Denney - MDDHosting LLC - Providing Hosting since 2007
Scalable shared hosting plans in the cloud! Check them out!
Highly Available Cloud Shared, Reseller, and VPS
http://www.mddhosting.com/




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users