Mark Rutland <mark.rutland@xxxxxxx> writes: > On Thu, May 05, 2022 at 03:52:24PM +0200, Vitaly Kuznetsov wrote: >> "Guilherme G. Piccoli" <gpiccoli@xxxxxxxxxx> writes: >> >> > On 05/05/2022 09:53, Mark Rutland wrote: >> >> [...] >> >> Looking at those, the cleanup work is all arch-specific. What exactly would we >> >> need to do on arm64, and why does it need to happen at that point specifically? >> >> On arm64 we don't expect as much paravirtualization as on x86, so it's not >> >> clear to me whether we need anything at all. >> >> >> >>> Anyway, the idea here was to gather a feedback on how "receptive" arm64 >> >>> community would be to allow such customization, appreciated your feedback =) >> >> >> >> ... and are you trying to do this for Hyper-V or just using that as an example? >> >> >> >> I think we're not going to be very receptive without a more concrete example of >> >> what you want. >> >> >> >> What exactly do *you* need, and *why*? Is that for Hyper-V or another hypervisor? >> >> >> >> Thanks >> >> Mark. >> > >> > Hi Mark, my plan would be doing that for Hyper-V - kind of the same >> > code, almost. For example, in hv_crash_handler() there is a stimer >> > clean-up and the vmbus unload - my understanding is that this same code >> > would need to run in arm64. Michael Kelley is CCed, he was discussing >> > with me in the panic notifiers thread and may elaborate more on the needs. >> > >> > But also (not related with my specific plan), I've seen KVM quiesce code >> > on x86 as well [see kvm_crash_shutdown() on arch/x86] , I'm not sure if >> > this is necessary for arm64 or if this already executing in some >> > abstracted form, I didn't dig deep - probably Vitaly is aware of that, >> > hence I've CCed him here. >> >> Speaking about the difference between reboot notifiers call chain and >> machine_ops.crash_shutdown for KVM/x86, the main difference is that >> reboot notifier is called on some CPU while the VM is fully functional, >> this way we may e.g. still use IPIs (see kvm_pv_reboot_notify() doing >> on_each_cpu()). When we're in a crash situation, >> machine_ops.crash_shutdown is called on the CPU which crashed. We can't >> count on IPIs still being functional so we do the very basic minimum so >> *this* CPU can boot kdump kernel. There's no guarantee other CPUs can >> still boot but normally we do kdump with 'nprocs=1'. > > Sure; IIUC the IPI problem doesn't apply to arm64, though, since that doesn't > use a PV mechanism (and practically speaking will either be GICv2 or GICv3). > This isn't really about PV: when the kernel is crashing, you have no idea what's going on on other CPUs, they may be crashing too, locked in a tight loop, ... so sending an IPI there to do some work and expecting it to report back is dangerous. >> For Hyper-V, the situation is similar: hv_crash_handler() intitiates >> VMbus unload on the crashing CPU only, there's no mechanism to do >> 'global' unload so other CPUs will likely not be able to connect Vmbus >> devices in kdump kernel but this should not be necessary. > > Given kdump is best-effort (and we can't rely on secondary CPUs even making it > into the kdump kernel), I also don't think that should be necessary. Yes, exactly. > >> There's a crash_kexec_post_notifiers mechanism which can be used instead >> but it's disabled by default so using machine_ops.crash_shutdown is >> better. > > Another option is to defer this to the kdump kernel. On arm64 at least, we know > if we're in a kdump kernel early on, and can reset some state based upon that. > > Looking at x86's hyperv_cleanup(), everything relevant to arm64 can be deferred > to just before the kdump kernel detects and initializes anything relating to > hyperv. So AFAICT we could have hyperv_init() check is_kdump_kernel() prior to > the first hypercall, and do the cleanup/reset there. In theory yes, it is possible to try sending CHANNELMSG_UNLOAD on kdump kernel boot and not upon crash, I don't remember if this approach was tried in the past. > > Maybe we need more data for the vmbus bits? ... if so it seems that could blow > up anyway when the first kernel was tearing down. Not sure I understood what you mean... From what I remember, there were issues with CHANNELMSG_UNLOAD handling on the Hyper-V host side in the past (it was taking *minutes* for the host to reply) but this is orthogonal to the fact that we need to do this cleanup so kdump kernel is able to connect to Vmbus devices again. -- Vitaly