Re: [109all] NOC update #2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sean,

I think this is obvious but, just to check my understanding:
this implies that all of the attacks on Meetecho for
unreliability of their software during those incidents were
misguided or misdirected.  Correct?

thanks,
   john




--On Thursday, November 19, 2020 01:56 +0000 Sean Croghan
<sean@xxxxxxxxxxxxxxxx> wrote:

> As previously reported, we tracked down the cause of the
> interruption of the iabopen session to an issue with an
> unexpected Azure network interface removal event on network
> interfaces provisioned with SR-IOV.  To prevent this happening
> again we intended to remove SR-IOV networking entirely.
> Unfortunately it now transpires that this change did not get
> applied to 2 of the 16 VMs including the application VM for
> the Plenary. So to add to the list of reasons to want 2020 to
> be over, towards the end of Plenary the same network interface
> removal event occurred and triggered an outage long enough to
> affect everyone.
> 
> I can confirm that the SR-IOV provisioning has now been
> removed from all VMs, which we believe eliminates the risk of
> the same thing happening again.  We continue to work with
> Azure Direct Support to determine the underlying cause of the
> removal events.
> 
> Please let me know if you have any questions.
> 
> Sean
> 
> 
> 
> On Nov 17, 2020, at 4:56 PM, Sean Croghan  wrote:
> 
> 
> 
> I have an update for those of you affected by the outage in
> yesterdays IABOPEN session. We have isolated this to a
> interrupt to the virtual machines network interface. We
> currently have no explanation for this outage. We have engaged
> the hardware and network team with Azure to determine the
> cause of this event but do not have an explanation at this
> time.
> 
> I will provide an update when we have received more
> information.
> 
> 
> For those interested in details:
> 
> At 07:56:36 UTC the network interface (eth0) went link down
> and the interface was removed from the VM At 08:00:28 UTC then
> a new interface was added to the VM At 08:00:29 UTC (eth1)
> went link up
> 
> Yes the VM added a new interface. The servers were provisioned
> with SR-IOV and we suspect that a migration event occurred
> that moved the VM to different hardware causing the NIC driver
> to be reloaded. We have found some evidence that would support
> our theory that a migration or unscheduled maintenance event
> occurred and are working to verify if that happened during
> this event. We have removed SR-IOV from the network interfaces
> on all servers.
> 
> I hope you are having a good and productive week
> 
> 
> — The IEFT NOC Team
> 
> --
> 109all mailing list
> 109all@xxxxxxxx<mailto:109all@xxxxxxxx>
> https://www.ietf.org/mailman/listinfo/109all
> 






[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux