Jump to content


MikeDVB

Member Since 27 Sep 2008
Offline Last Active Oct 28 2020 11:38 AM

#6338 R2 Unplanned Outage - 02/16/17 - 7:30 AM - 8:45 AM ET

Posted by MikeDVB on 16 February 2017 - 08:49 AM

Everything is back online but MySQL is repairing any tables that may have been getting written to when the restart happened.


  • 1


#6319 Recent Network-Wide Connectivity Disruptions

Posted by MikeDVB on 11 January 2017 - 10:52 AM

didn't stumble on this thread, until i create support ticket. How do i get email notification of maintenance work in advance?

https://forums.mddho...ues-and-events/


  • 1


#6313 SupportedNS.com - Google Phishing Warning - Temporary URLs Disabled Network-Wide

Posted by MikeDVB on 07 January 2017 - 03:52 PM

Unfortunately spammers, fraudsters, and phishers are constantly working to find new ways to perpetuate their activities.
 
For a long time we've allowed temporary URLs of the format http://server-name.com/~account-username/.
 
What this means is that if you wanted to test loading content from your site while avoiding DNS - you could just put in your server name, such as 's3.supportedns.com' followed by '/~your-cpanel-username' and see your site.  It's not a perfect solution because many scripts, such as WordPress, look at the URL requested and will give a 404 on a temporary URL if not configured for the temporary URL.
 
Due to a recent phishing attack on the S3 server - where an account was used via this temporary URL to host a phishing page - the entire server hostname has been blacklisted by google temporarily.  Here is what you may see trying to access your cPanel, WHM, or Webmail on the S3 server currently:
temporary-url-phishing.png
 
If you do see this you can safely click "Ignore this warning" in the lower right hand corner of the page to proceed to your cPanel, WHM, or Webmail.
 
We are presently working to identify which account is hosting phishing content, however, without a direct report containing the URL we're searching for a needle in a stack of needles.  Being that we've been unable to identify the account with the bad content as of yet we've disabled temporary URL access on this server.  This will cause the original listing to fail and ultimately Google will de-list the server hostname.
 
This isn't the first time this is happened and I'm sure it's not the last.  Unfortunately the convenience of temporary URLs does not out-weight the ability of our customers to reliably and consistently access their cPanel, WHM, and Webmail interfaces.  As a result - we've chosen to disable temporary URLs network-wide.
 
The best way to gain temporary access to an account is to park the subdomain of an active domain on the account.  For example if you run "mysite.com" you can park "accountname.mysite.com" onto the account and then load it via that subdomain.  If you need assistance with this our technical support department will be happy to assist you.
 
To view the site on it's actual domain prior to DNS propagation you can also use the HOSTS file method.  It is admittedly a bit more complicated but it does work well when properly implemented.  Not only will the HOSTS file method result in your site displaying properly [as it's not a different URL from normal] but it does not face the issues the normal temporary URL structure faces.  Directions on using the HOSTS file can be found on our blog here.
 
If you have any questions about any of this feel free to either reply here or to open a support ticket.
 
Thank you!
  • 1


#6260 [Zayo Transit Outage] Intermittent Connectivity for Some Customers Tonight -...

Posted by MikeDVB on 30 October 2016 - 07:00 PM

At approximately 6:30 PM we were alerted to an outage affecting the S1 server.  Upon investigation we found the server to be online and operational although it didn't appear to be receiving very many requests from the internet.

 

Upon investigation we determined that there were networking issues affecting one of our transit providers - Zayo.

 

Thankfully this didn't affect everybody - only those whose traffic naturally traversed Zayo's network to or from us.  Now that we've removed this providers from our bandwidth mix all traffic is now flowing over our other transit providers and there are no more outages or issues.

 

If Zayo provides an official RFO [Reason for Outage] we'll make it available here.

 

It's worth noting that all of our servers and networking equipment were online and operational without issue during this entire incident.  The issue was outside of our network border and we had to engage our networking team at the facility to remove the failing transit providers from our bandwidth mix.

 

If you have any questions or concerns do please let us know.


  • 1


#6212 Let's Encrypt installation

Posted by MikeDVB on 26 July 2016 - 09:54 AM

 

Could you tell me, why i am not able to lets encrypt certificate here? I get following error.

There was a problem processing your request

  • Error issuing certificate
  • Failed to issue certificate
  • The Let's Encrypt HTTP challenge failed: acme error 'urn:acme:error:unauthorized': Invalid response from http://www.notionplu...PMpYLFfsTHAmFY:"<!DOCTYPE html> <html class="no-js css-menubar" lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatib"

 

Generally what that means is that you have a .htaccess rule blocking access or otherwise your script is handling it / interfering.

 

The way Let's Encrypt works is:

1. It contacts the Let's Encrypt Servers and requests the certificate.

2. The Let's Encrypt servers send back details to the plugin telling it to create a verification file [the ".well-known/acme-challenge/etc...." link above].

3. Let's Encrypt then connects to the URL and attempts to load the file.  If it can - it knows you have control over the domain and issues the certificate.

4. If it can't load the file - it cannot verify you control the domain and the installation will fail.

 

In short - you need to look at your content and be sure that the file Let's Encrypt is trying to access is accessible.

 

If you need further assistance with this you'll need to open a support ticket.


  • 1


#6162 Connectivity issues over Telia and Hurricane Electric [Transit Providers]

Posted by MikeDVB on 10 June 2016 - 01:38 PM

Telia released their official RFO [Reason For Outage] and here are those details:

Dear Customer,

This is a Reason for Outage Report with details regarding the case you have opened with TeliaSonera International Carrier.

Country: United States
TeliaSonera Case Reference: 00563796
Network Impact: Packet loss on the Telia Carrier U.S. backbone.
Case Opened: 6/9/2016 7:00 PM (After the issue had begun)
Case Ready for Service: 6/9/2016 7:23 PM

Reason for Outage: Incorrect ISIS metric and multiple commits while turning up new Telia Carrier backbone links in Dallas caused a loop of reconverging BGP and ISIS protocols. This put a very high CPU load on our U.S. routers and caused some trans-Atlantic congestion.
Actions Taken: The nyk-bb1 inner-core router (New York) was the first router to show real problems when we received alarms indicating packet loss on transit-Atlantic traffic together with high CPU utilization. The router was taken out of service. Further investigation revealed that the root cause was too many commits by Implementation while turning up new backbone links on dls-b22 (Dallas), along with an incorrect metric. The configuration in dls-b22 was rolled back to alleviate the problem and nyk-bb1 has been put back in service. This resolution is permanent and the will be no further loss related to this issue
Additional Information: Telia Implementation team is making significant changes to their way of working to mitigate this from happening in the future.


Please note that all the time stamps given above are in UTC unless otherwise stated.

Please bear in mind that this was a major issue with the internet itself and one of it's larger backbone providers. This was not within our power to detect, prevent, or resolve.

We do apologize for any trouble this outage caused you.
  • 1


#6099 R1, R3, S2, S3, S4 - Blank PHP Pages, MySQL Issues, Etc - Fix in Progress

Posted by MikeDVB on 05 May 2016 - 06:17 PM

The issue to hit these servers is the same as the one to hit our P1 server this morning:

http://forums.mddhos...t-php-versions/

 

Thankfully we already know what the fix is and are applying it to the affected servers.  The fix is applied account-by-account and it takes a couple of seconds per account.

 

If you are experiencing this issue now - it will be resolved in the very near future as we are already in the process of applying it.

 

While it is applying - we are investigating ways to speed up the process.


  • 1


#5916 Jasmine, Kobold, SR2, SR3 Reseller Account Transfers - Beginning 9 PM ET 03/0...

Posted by MikeDVB on 03 March 2016 - 06:32 PM

All four servers: SR2, SR3, Jasmine, and Kobold have been migrated to their respective target servers.

 

As far as those that had MX entries disappear - the cause in every case so far is that the domain was set to 'LOCAL' or 'AUTO (Local)' via cPanel -> MX Entry.

 

In short people are editing their DNS zone and updating MX records or doing it via the MX Tool and not setting it to 'remote' when using remote mail.

 

We've written a script to check for this and correct it but unfortunately we can't go back into the past and fix it.  Moving forward it shouldn't happen again.

 

This has affected a small portion of our customers - but I am sorry for those that it has affected.

 

Open a ticket if you're having issues and we'll address your issues as quickly as possible.


  • 1


#5546 SR1, SD1, VPS1 Outage

Posted by MikeDVB on 22 April 2015 - 05:50 PM

Yup - servers coming online.

 

It looks like a critical piece of software wasn't properly licensed.  I do believe I can place the fault for this matter squarely on my shoulders as it looks like I didn't install the licenses properly.  I do personally apologize for that.  Thankfully the process is well documented now and will not recur.

 

Due to the nature of the issue the servers were shut down gracefully so no data loss or damage is expected.  It would be as though it were simply a long reboot.


  • 2


#5375 SD1, SR1, VPS1 Outage on 10/29/2014 @ 2:04 AM

Posted by MikeDVB on 29 October 2014 - 12:37 PM

Hello,

 

First and foremost I want to apologize for this outage.  We were alerted at 2:04 AM to this issue by our internal monitoring and have been working on restoring since that time.  All hands were brought on deck and we were able to solve an obscure and undocumented issue restoring all services successfully after approximately 7 hours.  I wanted to go over the issue in a bit more detail and you will find that below.

We use Logical Volume Management to provide distinct storage to each piece of hardware.  The volumes are then thinly provisioned so that they can share the same overall pool of storage without using more space than they actually need allowing maximum efficiency of the storage.

Today at 2:04 AM ET the storage system for the SD1, SR1, and VPS1 servers went read-only with file system errors.  All server administrators were alerted and brought on duty to investigate and resolve the issue.  Upon initial investigation we determined that due to a single configuration error the servers were not giving back free space to the pool resulting in them growing and never shrinking.  We do monitor the storage but we were not correctly watching this metric.

All initial research on this specific issue indicated that the data was irreparably destroyed and we determined we needed to begin restoring our backups from 10:30 PM [3.5 hours before the issue] to hardware.  We brought up extra hardware and began restoring backups immediately.  While restorations were in progress we continued to work at recovering the data on the original storage.  It took us about 7 hours to successfully repair the data at which time we checked the restoration and it showed at least 3 more hours remaining.  

We then brought up the servers with the repaired storage and SD1 and SR1 came online immediately.  VPS1 needed a file system check but due to the storage being solid state these were completed within minutes and VPS1 was brought back online.

We have identified several changes that will prevent this from happening again.
* We will be properly monitoring for storage pool free space to catch this issue before it becomes a problem.
* We will be converting the storage to a state more easily managed
* We are looking at high availability storage that will prevent this issue.

 

You may find it unusual that we were not keeping the forums up to date as this is something we do when there is an outage.  I normally handle the forum updates personally but due to the critical nature of this outage I was focused on resolving the issue.  In the future should an issue arise where I am focused on restoring the service I will bring an additional staff member in for the sole purpose of keeping customers updated on changes as they happen if I'm not able to do it myself.

 

Outages such as these are extremely rare for us and we do apologize again for any trouble it may have caused you.


  • 1


#5374 SR1 and SD1 Down

Posted by MikeDVB on 29 October 2014 - 10:58 AM

Anybody who opened a ticket got a direct response with cause, explanation, what we did to resolve the issue, and what we're doing to prevent it.

 

I will be making a formal post here on the forums within 48 hours outlining all of this detail for anybody interested but I need some time to draft the formal report and make sure we've covered all of our bases on preventing it from happening again.

 

If you want the details early open a ticket regarding the matter and I'll be happy to send the non-formal version over.


  • 1


#5004 Facility-Wide Disruption of Connectivity

Posted by MikeDVB on 01 March 2014 - 02:11 AM

Sorry - it's been a long and arduous day and I forgot to post an update.  The issue was resolved when we dropped HE and Telia for us and our customers.  The issue was then resolved by the facility before HE and Telia were turned back on.
 
I will be posting the official Reason For Outage [RFO] once the facility has it available likely on Tuesday or Wednesday.
  • 1


#4999 WordPress wp-login Brute Force - Kobold Server Update: Boreas and Jasmine

Posted by MikeDVB on 28 February 2014 - 02:38 PM

All blocks flushed - modified the detection method to help eliminate false positives.


  • 1


#4998 WordPress wp-login Brute Force - Kobold Server Update: Boreas and Jasmine

Posted by MikeDVB on 28 February 2014 - 02:20 PM

We are blocking IPs by those that have POSTED data to wp-login.php but have *not* requested any CSS files [users accessing the wordpress log-in will have requested and received a CSS file] bots do not request anything - they simply post data to the wp-login.php and look for a success/fail result.

 

It is possible there will be some false positives so if you are unable to reach your account/server simply open a ticket with your IP [ http://www.mddhostin.../whatismyip.php ] and we'll remove the block.


  • 1


#4992 Facility-Wide Disruption of Connectivity

Posted by MikeDVB on 28 February 2014 - 11:34 AM

THANK YOU EVERYBODY THAT SENT IN A TRACEROUTE.

 

We were able to identify the issue down to the Hurricane Electric and Telia transit providers.  We turned off those providers and all traffic is now routing over alternate unaffected routes - i.e. everybody should be back online at this point.

 

Understand due to the nature of this issue it is possible there will be further downtime/connectivity issues as there is no way for me to predict what changes have to be made to fully resolve the issue.  Turning off these transit providers is a band-aid over the symptoms and not a solution to the problem.  The goal was to bring everybody back online as quickly as possible.

 

We couldn't have isolated this issue without the traceroutes - thank you.

 

That said - if you are still experiencing connectivity issues do please email maintenance@ with a traceroute.


  • 3