Jump to content
MDDHosting Forums

S1 / R1 Servers - Network Device Updates - ~9 PM ET Jan 12, 2016.


Recommended Posts

Hello!

 

One of the great features of our new infrastructure is that we can migrate a server from one piece of hardware to another seamlessly. This allows us to perform maintenance without downtime, etc...

 

Unfortunately the networking cards in the new hardware were not properly updated to the latest version resulting in some weird latency issues every once in a great while. We've been monitoring both servers on a second-by-second basis since we resolved the LiteSpeed issue and the server has been stable.

 

The issue comes in when we do wish to seamlessly migrate the server - the host goes unresponsive for 30 to 60 seconds as the network interface crashes and restarts.

 

We will be bringing down the S1 and R1 servers tonight for approximately 5 minutes [as long as it takes to shut down and boot back up] to update the networking firmware after which we should be able to perform maintenance in the future without any scheduled downtime whatsoever.

 

We expect to begin around 9 PM ET and expect the downtime to be no greater than 5 minutes. We do apologize for the growing pains we are experiencing with this new hardware - it's a completely new setup from what we have run for years and we're running into small edge issues that we didn't anticipate.

 

If you have any questions about this - let us know. We'll keep it as seamless as possible - and there is nothing for you to do / that you need to do.

Link to comment
Share on other sites

For those that want more detail - this is the issue we're currently facing with the S1 server:

 

 

Message from syslogd@s1 at Jan 16 17:17:27 ...
kernel:BUG: soft lockup - CPU#44 stuck for 23s! [imap-login:1021336]

Message from syslogd@s1 at Jan 16 17:17:27 ...
kernel:BUG: soft lockup - CPU#11 stuck for 22s! [migration/11:159]

Message from syslogd@s1 at Jan 16 17:17:27 ...
kernel:BUG: soft lockup - CPU#1 stuck for 144s! [mysqld:807721]

Message from syslogd@s1 at Jan 16 17:17:27 ...
kernel:BUG: soft lockup - CPU#17 stuck for 140s! [mysqld:4000]

Message from syslogd@s1 at Jan 16 17:17:27 ...
kernel:BUG: soft lockup - CPU#22 stuck for 133s! [migration/22:214]

Message from syslogd@s1 at Jan 16 17:17:27 ...
kernel:BUG: soft lockup - CPU#19 stuck for 151s! [mysqld:4374]

 

Now when this happens we also lose connectivity to our server storage. The issue is that we're unsure if the storage losing connectivity is causing the CPU errors or the CPU errors are causing the storage connectivity issues.

 

The plan today was for me to spin up several test systems on the new hardware and to work as hard as I can to identify and isolate the cause of the issues. If we need new networking cards we'll get them. If we need to change the Operating Systems we'll do it.

 

At the end of the day I do want to apologize deeply for any and all trouble this issue is causing you. We're not any more happy about it than you are and will have it resolved as quickly as we possibly can.

 

I'm going to do my best to keep this thread up to date so you know what is going on. I do have an update to post - and will post it in a moment to keep it separate from this one.

Link to comment
Share on other sites

The S1 server had another brief outage, however, this time around the OS marked the file system as read-only and we should perform a file system integrity check.

 

Due to the speed of the new hardware this should not take long - but due to the amount of data it could take up to an hour or two. Ideally it will be done in 10 to 15 minutes.

 

I'll keep this thread updated. We really need to do this as soon as possible.

Link to comment
Share on other sites

The server is online, however, you may have some issues accessing some files through the browser. If you do - open a ticket and we'll get it touched up for you real quick.

 

We're still going to perform the FSCK but I'm going to try and send out an email to everybody prior so you know what's going on even if you aren't watching this thread.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...