I have to say, this is the first time in over 5 years I've felt like I could have a heart attack while working to resolve an issue. We do still have another server offline with the exact same issue but it's the exact same hardware build so we're swapping processors on it as well. If all goes well it will come back online and all services will be 100% restored.
We're then going to investigate why this happened and see what we have to do to to correct it - be it installing larger/different coolers on the processors, more fans in the chassis, discussing cooling with the facility, etc... At this point I don't have an 'Reason For Outage' beyond hardware failure but once we have more details as to what exactly caused this I will be sure to post them up.
If this system meltdown thing was a pattern with MDD like it is other places, then and only then would I worry. Been a number of years since I moved over here, first time this has ever happened...can't say the same about other places.
Hopefully someone is trained in CPR.
and so it goes...