On Mon, 2025-02-10 at 15:57 +0000, Marc Zyngier wrote: > On Thu, 06 Feb 2025 20:08:10 +0000, > Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > Hi! > > > > KVM on ARM has this function, and it seems to be only used in a couple of places, mostly for > > initialization. > > > > We recently noticed a CI failure roughly like that: > > Did you only recently noticed because you only recently started > testing with lockdep? As far as I remember this has been there > forever. Hi, I also think that this is something old, I guess our CI started to test aarch64 kernels with debug lags enabled or something like that. > > > [ 328.171264] BUG: MAX_LOCK_DEPTH too low! > > [ 328.175227] turning off the locking correctness validator. > > [ 328.180726] Please attach the output of /proc/lock_stat to the bug report > > [ 328.187531] depth: 48 max: 48! > > [ 328.190678] 48 locks held by qemu-kvm/11664: > > [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 > > [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 > > [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 > > [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 > > [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 > > [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 > > [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 > > > > > > .. > > .. > > .. > > .. > > > > > > As far as I see currently MAX_LOCK_DEPTH is 48 and the number of > > vCPUs can easily be hundreds. > > 512 exactly. Both of which are pretty arbitrary limits. > > > Do you think that it's possible? or know if there were any efforts > > to get rid of lock_all_vcpus to avoid this problem? If not possible, > > maybe we can exclude the lock_all_vcpus from the lockdep validator? > > I'd be very wary of excluding any form of locking from being checked > by lockdep, and I'd rather we bump MAX_LOCK_DEPTH up if KVM is enabled > on arm64. it's not like anyone is going to run that in production > anyway. task_struct may not be happy about that though. > > The alternative is a full stop_machine(), and I don't think that will > fly either. > > > AFAIK, on x86 most of the similar cases where lock_all_vcpus could > > be used are handled by assuming and enforcing that userspace will > > call these functions prior to first vCPU is created an/or run, thus > > the need for such locking doesn't exist. > > This assertion doesn't hold on arm64, as this ordering requirement > doesn't exist. We already have a bunch of established VMMs doing > things in random orders (QEMU being the #1 offender), and the sad > reality of the Linux ABI means this needs to be supported forever. Understood. Best regards, Maxim Levitsky > > Thanks, > > M. >