On 30/03/17 17:40, Suzuki K Poulose wrote:
On 30/03/17 16:41, Marc Zyngier wrote:
On 30/03/17 16:29, Mark Rutland wrote:
On Thu, Mar 30, 2017 at 03:31:12PM +0100, Mark Rutland wrote:
Hi,
I'm seeing the splat below when running KVM on an arm64 host with
CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_LOCKDEP enabled.
I saw this on v4.11-rc1, and I can reproduce the problem on the current
kvmarm master branch (563e2f5daa66fbc1).
I've hacked noinlines into arch/arm/kvm/mmu.c in an attempt to get a
better backtrace; without this, the report says the call is at
arch/arm/kvm/mmu.c:299, which is somewhat confusing.
Looking again, that initial kernel was not a vanilla v4.11-rc1, and I am
*not* able to reproduce this issue with a vanilla v4.11-rc1.
I believe I had applied an earlier fix for the locking issue Suzuki
recently addressed, which was why my line numbers were off.
I *can* trigger this issue with the current kvmarm master, and the log I
posted is valid.
Sorry for the bogus info; I will be more careful next time.
No worries, thanks Mark.
So here's my (very) superficial analysis of the issue:
- Memory pressure, we try to swap out something
- try_to_unmap_one takes a spinlock (via page_vma_mapped_walk)
- MMU notifier kick in with the spinlock held
- we take kvm->mmu_lock
- in unmap_stage2_range, we do a cond_resched_lock(kvm->mmu_lock)
- we still hold the page_vma_mapped_walk spinlock, might_sleep screams
Correct.
I can see multiple ways of doing this:
1) We track that we're coming via an MMU notifier, and don't call
cond_resched_lock() in that case
2) We get rid of cond_resched_lock()
3) we have a different code path for the MMU notifier that doesn't
involve cond_resched_lock().
Only (1) vaguely appeals to me, and I positively hate (3). We could
revert to (2), but it is likely to be helpful when tearing down large
ranges.
Another possibility is that the might_sleep() warning is just spurious,
and I think that Suzuki has a theory...
So the cond_resched_lock() thinks that it could drop the lock and do a sched.
So it issues __might_sleep(PREEMPT_LOCK_OFFSET) to make sure the preempt_count
is as if there is only one {i.e, this} spin_lock preventing the preemption, and
it could drop the lock after verifying should_resched(PREEMPT_LOCK_OFFSET) (and bit more
other checks) turns true. But since we are holding two spin_locks in this rare path,
the preempt_offset is PREEMPT_LOCK_OFFSET + 1 (as we do preempt_count_inc for each
lock).
Now, ___might_sleep() checks if the preempt_count exactly matches the passed count, using
preempt_count_equals(), and goes ahead to complain if it doesn't, even if the preempt_count
is greater than the passed count, in which case the should_resched() should deny the schedule
operation and we skip it. So, I think, ___might_sleep should really check
if (preempt_count >= preempt_offset) to scream about it, of course making sure that the callers
s/to scream/not to scream/
honoring the same case.
To summarise, since we do have two locks held, we won't be able to reschedule in this case
but the might_sleep screams even if we are safe.
Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm