Is it possible to revert to a known/previous/good configuration for the time being and maintain the current state of s1 offline as a testing platform to figure out the issue? Despite echo's issues I feel like it was more reliable in recent weeks than s1 has been since the migration. I know we're not paying for anywhere near 100% uptime and I completely understand that outages happen (I'm a Linode customer as well, so it's been a rough few weeks), but it seems at least in the short term this migration has caused more problems than it has solved. I fully support what you're doing to improve reliability down the road but I feel like it would be better for those both sides of this issue to get out of emergency/troubleshooting/damage control mode, get things running, and have time to properly diagnose outside of a production environment. Particularly when this appears to be a hardware/configuration problem and not a DDOS or upstream connectivity issue.
Please believe me when I say I truly respect what you're doing and all the hard work you're putting in to this, but I feel like we're beta testers in this endeavor and as a result it is hurting me and my clients. I know how hard it is to maintain servers - that's why I have several clients hosted here, because I trust you and I don't want to be spending the late hours doing it myself. I feel like when things are rushed it is more difficult to solve problems. I have no issue at all with copies of all my client sites being duplicated in a testing environment or whatever it takes to help get this new configuration up and running reliably. I'm not as comfortable with it happening on the server the A and MX records are pointed to, if that can be avoided at this point.