On Sat, 18 Sept 2021 at 11:37, Marco Elver <elver@xxxxxxxxxx> wrote: > > On Sat, 18 Sept 2021 at 10:07, Liu Shixin <liushixin2@xxxxxxxxxx> wrote: > > > > On 2021/9/16 16:49, Marco Elver wrote: > > > On Thu, 16 Sept 2021 at 03:20, Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > >> Hi Marco, > > >> > > >> We found kfence_test will fails on ARM64 with this patch with/without > > >> CONFIG_DETECT_HUNG_TASK, > > >> > > >> Any thought ? > > > Please share log and instructions to reproduce if possible. Also, if > > > possible, please share bisection log that led you to this patch. > > > > > > I currently do not see how this patch would cause that, it only > > > increases the timeout duration. > > > > > > I know that under QEMU TCG mode, there are occasionally timeouts in > > > the test simply due to QEMU being extremely slow or other weirdness. > > > > > > > > Hi Marco, > > > > There are some of the results of the current test: > > 1. Using qemu-kvm on arm64 machine, all testcase can pass. > > 2. Using qemu-system-aarch64 on x86_64 machine, randomly some testcases fail. > > 3. Using qemu-system-aarch64 on x86_64, but removing the judgment of kfence_allocation_key in kfence_alloc(), all testcase can pass. > > > > I add some printing to the kernel and get very strange results. > > I add a new variable kfence_allocation_key_gate to track the > > state of kfence_allocation_key. As shown in the following code, theoretically, > > if kfence_allocation_key_gate is zero, then kfence_allocation_key must be > > enabled, so the value of variable error in kfence_alloc() should always be > > zero. In fact, all the passed testcases fit this point. But as shown in the > > following failed log, although kfence_allocation_key has been enabled, it's > > still check failed here. > > > > So I think static_key might be problematic in my qemu environment. > > The change of timeout is not a problem but caused us to observe this problem. > > I tried changing the wait_event to a loop. I set timeout to HZ and re-enable/disabled > > in each loop, then the failed testcase disappears. > > Nice analysis, thanks! What I gather is that static_keys/jump_labels > are somehow broken in QEMU. > > This does remind me that I found a bug in QEMU that might be relevant: > https://bugs.launchpad.net/qemu/+bug/1920934 > Looks like it was never fixed. :-/ > > The failures I encountered caused the kernel to crash, but never saw > the kfence test to fail due to that (never managed to get that far). > Though the bug I saw was on x86 TCG mode, and I never tried arm64. If [ ... that is, I didn't try running QEMU-ASan in arm64 TCG mode ... of course I use QEMU arm64 to test. ;-) ] > you can, try to build a QEMU with ASan and see if you also get the > same use-after-free bug. > > Unless we observe the problem on a real machine, I think for now we > can conclude with fairly high confidence that QEMU TCG still has > issues and cannot be fully trusted here (see bug above). > > Thanks, > -- Marco