Re: Deadlock due to EPT_VIOLATION

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 18 Aug 2023, Sean Christopherson wrote:
> On Thu, Aug 17, 2023, Eric Wheeler wrote:
> > On Thu, 17 Aug 2023, Sean Christopherson wrote:
> > > > > kprobe:handle_ept_violation
> > > > > {
> > > > > 	printf("vcpu = %lx pid = %u MMU seq = %lx, in-prog = %lx, start = %lx, end = %lx\n",
> > > > > 	       arg0, ((struct kvm_vcpu *)arg0)->pid->numbers[0].nr,
> > > > > 	       ((struct kvm_vcpu *)arg0)->kvm->mmu_invalidate_seq,
> > > > > 	       ((struct kvm_vcpu *)arg0)->kvm->mmu_invalidate_in_progress,
> > > > > 	       ((struct kvm_vcpu *)arg0)->kvm->mmu_invalidate_range_start,
> > > > > 	       ((struct kvm_vcpu *)arg0)->kvm->mmu_invalidate_range_end);
> > > > > }
> > > > > 
> > > > > If you don't have BTF info, we can still use a bpf program, but to get at the
> > > > > fields of interested, I think we'd have to resort to pointer arithmetic with struct
> > > > > offsets grab from your build.
> > > > 
> > > > We have BTF, so hurray for not needing struct offsets!
> > 
> > Well, I was part right: not all hosts have BTF.

First things first, we got a trace from a machine _with_ BTF!

These the only items showing in-prog=1 (first column is count from uniq -c):

      1 ept[0] vcpu=ffff9c436d26c680 seq=32524620 inprog=1 start=7f32477d7000 end=7f32477d8000
      1 ept[0] vcpu=ffff9c436d26c680 seq=32524aee inprog=1 start=7f3252209000 end=7f325220a000
      1 ept[0] vcpu=ffff9c436d26c680 seq=32527895 inprog=1 start=7f329504d000 end=7f329504e000
      1 ept[0] vcpu=ffff9c436d26c680 seq=325279eb inprog=1 start=7f3296f00000 end=7f3296f01000
      1 ept[0] vcpu=ffff9c436d26c680 seq=325279f5 inprog=1 start=7f3296fae000 end=7f3296faf000
      1 ept[0] vcpu=ffff9c436d26c680 seq=32527b4d inprog=1 start=7f329937e000 end=7f329937f000
      1 ept[0] vcpu=ffff9c436d26c680 seq=32525ef6 inprog=1 start=7f3272503000 end=7f3272504000
      1 ept[0] vcpu=ffff9c436d26c680 seq=32526517 inprog=1 start=7f327a568000 end=7f327a569000
      1 ept[0] vcpu=ffff9c436d26c680 seq=325268e8 inprog=1 start=7f327e4a4000 end=7f327e4a5000
      1 ept[0] vcpu=ffff9c436d26c680 seq=32527543 inprog=1 start=7f328f8ca000 end=7f328f8cb000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c861ab6 inprog=1 start=7fb4c67de000 end=7fb4c67df000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c862600 inprog=1 start=7fb48c132000 end=7fb48c133000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c862a8b inprog=1 start=7fb4f06b8000 end=7fb4f06b9000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c862b9f inprog=1 start=7fb4f1861000 end=7fb4f1862000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c862d33 inprog=1 start=7fb4e72f5000 end=7fb4e72f6000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c86415c inprog=1 start=7fb49fb5a000 end=7fb49fb5b000
      1 ept[0] vcpu=ffff9c43d5618000 seq=1c864162 inprog=1 start=7fb49fb59000 end=7fb49fb5a000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862ba1 inprog=1 start=7fb4f0e24000 end=7fb4f0e25000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862bab inprog=1 start=7fb4f0e26000 end=7fb4f0e27000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862bb1 inprog=1 start=7fb4f0e27000 end=7fb4f0e28000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862cbd inprog=1 start=7fb4efffd000 end=7fb4efffe000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862cc4 inprog=1 start=7fb4f0692000 end=7fb4f0693000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862d32 inprog=1 start=7fb4dd282000 end=7fb4dd283000
      1 ept[0] vcpu=ffff9c533e1dc680 seq=1c862d36 inprog=1 start=7fb4e8e97000 end=7fb4e8e98000
      1 ept[0] vcpu=ffff9c436d26c680 seq=3252adeb inprog=1 start=7f326209b000 end=7f326209c000

The entire dump is 22,687 lines if you want to see it, here (expires in 1 week):

	https://privatebin.net/?9a3bff6b6fd2566f#BHjrt4NGpoXL12NWiUDpThifi9E46LNXCy7eWzGXgqYx

> > 
> > What is involved in doing this with struct offsets for Linux v6.1.x?
> 
> Unless you are up for a challenge, I'd drop the PID entirely, getting that will
> be ugly.
> 
> For the KVM info, you need the offset of "kvm" within struct kvm_vcpu (more than
> likely it's '0'), and then the offset of each of the mmu_invaliate_* fields within
> struct kvm.  These need to come from the exact kernel you're running, though unless
> a field is added/removed to/from struct kvm between kernel versions, the offsets
> should be stable.
> 
> A cheesy/easy way to get the offsets is to feed offsetof() into __aligned and
> then compile.  So long as the offset doesn't happen to be a power-of-2, the
> compiler will yell.  E.g. with this
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 92c50dc159e8..04ec37f7374a 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -543,7 +543,13 @@ struct kvm_hva_range {
>   */
>  static void kvm_null_fn(void)
>  {
> +       int v __aligned(offsetof(struct kvm_vcpu, kvm));
> +       int w __aligned(offsetof(struct kvm, mmu_invalidate_seq));
> +       int x __aligned(offsetof(struct kvm, mmu_invalidate_in_progress));
> +       int y __aligned(offsetof(struct kvm, mmu_invalidate_range_start));
> +       int z __aligned(offsetof(struct kvm, mmu_invalidate_range_end));
>  
> +       v = w = x = y = z = 0;
>  }
>  #define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)
> 
> I get yelled at with (trimmed):
> 
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:546:34: error: requested alignment ‘0’ is not a positive power of 2 [-Werror=attributes]
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:547:20: error: requested alignment ‘36960’ is not a positive power of 2
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:549:20: error: requested alignment ‘36968’ is not a positive power of 2
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:551:20: error: requested alignment ‘36976’ is not a positive power of 2
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:553:20: error: requested alignment ‘36984’ is not a positive power of 2

Neat trick.

So here are my numbers:

# make modules  KDIR=virt 2>&1 | grep -A1 alignment |grep -v ^-
arch/x86/kvm/../../../virt/kvm/kvm_main.c:568:40: error: requested alignment ‘0’ is not a positive power of 2 [-Werror=attributes]
  568 |        int v __aligned(offsetof(struct kvm_vcpu, kvm));
arch/x86/kvm/../../../virt/kvm/kvm_main.c:569:40: error: requested alignment ‘39552’ is not a positive power of 2
  569 |        int w __aligned(offsetof(struct kvm, mmu_invalidate_seq));
arch/x86/kvm/../../../virt/kvm/kvm_main.c:570:40: error: requested alignment ‘39560’ is not a positive power of 2
  570 |        int x __aligned(offsetof(struct kvm, mmu_invalidate_in_progress));
arch/x86/kvm/../../../virt/kvm/kvm_main.c:571:40: error: requested alignment ‘39568’ is not a positive power of 2
  571 |        int y __aligned(offsetof(struct kvm, mmu_invalidate_range_start));
arch/x86/kvm/../../../virt/kvm/kvm_main.c:572:40: error: requested alignment ‘39576’ is not a positive power of 2
  572 |        int z __aligned(offsetof(struct kvm, mmu_invalidate_range_end));

and the resulting script:
	kprobe:handle_ept_violation
	{
		$kvm = *((uint64 *)((uint64)arg0 + 0));

		printf("vcpu=%08lx seq=%08lx inprog=%lx start=%08lx end=%08lx\n",
			arg0, 
		       *((uint64 *)($kvm + 39552)),
		       *((uint64 *)($kvm + 39560)),
		       *((uint64 *)($kvm + 39568)),
		       *((uint64 *)($kvm + 39576))
		       );
	}

... but the output shows all 0's except vcpu:

	# bpftrace ./handle_ept_violation.bt |grep ^vcpu | uniq -c
	     11 vcpu=ffff9d518541c680 seq=00000000 inprog=0 start=00000000 end=00000000
	     29 vcpu=ffff9d80cc120000 seq=00000000 inprog=0 start=00000000 end=00000000
	    331 vcpu=ffff9d5f1d1a2340 seq=00000000 inprog=0 start=00000000 end=00000000
	    858 vcpu=ffff9d80c7b98000 seq=00000000 inprog=0 start=00000000 end=00000000
	   2183 vcpu=ffff9d6033fb2340 seq=00000000 inprog=0 start=00000000 end=00000000

Did I do something wrong here?

-Eric

> 
> Then take those offsets and do math.  For me, this provides the same output as
> the above pretty version.  Just use common sense and verify you're getting sane
> data.
> 
> kprobe:handle_ept_violation
> {
> 	$kvm = *((uint64 *)((uint64)arg0 + 0));
> 
> 	printf("vcpu = %lx MMU seq = %lx, in-prog = %lx, start = %lx, end = %lx\n",
> 	       arg0,
>                *((uint64 *)($kvm + 36960)),
>                *((uint64 *)($kvm + 36968)),
>                *((uint64 *)($kvm + 36976)),
>                *((uint64 *)($kvm + 36984)));
> }
> 
> 






--
Eric Wheeler


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux