Re: [PATCH 5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 05, 2024 at 05:43:17PM +0800, Yan Zhao wrote:
> On Wed, Sep 04, 2024 at 05:41:06PM -0700, Sean Christopherson wrote:
> > On Wed, Sep 04, 2024, Yan Zhao wrote:
> > > On Wed, Sep 04, 2024 at 10:28:02AM +0800, Yan Zhao wrote:
> > > > On Tue, Sep 03, 2024 at 06:20:27PM +0200, Vitaly Kuznetsov wrote:
> > > > > Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> > > > > 
> > > > > > On Mon, Sep 02, 2024, Vitaly Kuznetsov wrote:
> > > > > >> FWIW, I use QEMU-9.0 from the same C10S (qemu-kvm-9.0.0-7.el10.x86_64)
> > > > > >> but I don't think it matters in this case. My CPU is "Intel(R) Xeon(R)
> > > > > >> Silver 4410Y".
> > > > > >
> > > > > > Has this been reproduced on any other hardware besides SPR?  I.e. did we stumble
> > > > > > on another hardware issue?
> > > > > 
> > > > > Very possible, as according to Yan Zhao this doesn't reproduce on at
> > > > > least "Coffee Lake-S". Let me try to grab some random hardware around
> > > > > and I'll be back with my observations.
> > > > 
> > > > Update some new findings from my side:
> > > > 
> > > > BAR 0 of bochs VGA (fb_map) is used for frame buffer, covering phys range
> > > > from 0xfd000000 to 0xfe000000.
> > > > 
> > > > On "Sapphire Rapids XCC":
> > > > 
> > > > 1. If KVM forces this fb_map range to be WC+IPAT, installer/gdm can launch
> > > >    correctly. 
> > > >    i.e.
> > > >    if (gfn >= 0xfd000 && gfn < 0xfe000) {
> > > >    	return (MTRR_TYPE_WRCOMB << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
> > > >    }
> > > >    return MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT;
> > > > 
> > > > 2. If KVM forces this fb_map range to be UC+IPAT, installer failes to show / gdm
> > > >    restarts endlessly. (though on Coffee Lake-S, installer/gdm can launch
> > > >    correctly in this case).
> > > > 
> > > > 3. On starting GDM, ttm_kmap_iter_linear_io_init() in guest is called to set
> > > >    this fb_map range as WC, with
> > > >    iosys_map_set_vaddr_iomem(&iter_io->dmap, ioremap_wc(mem->bus.offset, mem->size));
> > > > 
> > > >    However, during bochs_pci_probe()-->bochs_load()-->bochs_hw_init(), pfns for
> > > >    this fb_map has been reserved as uc- by ioremap().
> > > >    Then, the ioremap_wc() during starting GDM will only map guest PAT with UC-.
> > > > 
> > > >    So, with KVM setting WB (no IPAT) to this fb_map range, the effective
> > > >    memory type is UC- and installer/gdm restarts endlessly.
> > > > 
> > > > 4. If KVM sets WB (no IPAT) to this fb_map range, and changes guest bochs driver
> > > >    to call ioremap_wc() instead in bochs_hw_init(), gdm can launch correctly.
> > > >    (didn't verify the installer's case as I can't update the driver in that case).
> > > > 
> > > >    The reason is that the ioremap_wc() called during starting GDM will no longer
> > > >    meet conflict and can map guest PAT as WC.
> > 
> > Huh.  The upside of this is that it sounds like there's nothing broken with WC
> > or self-snoop.
> Considering a different perspective, the fb_map range is used as frame buffer
> (vram), with the guest writing to this range and the host reading from it.
> If the issue were related to self-snooping, we would expect the VNC window to
> display distorted data. However, the observed behavior is that the GDM window
> shows up correctly for a sec and restarts over and over.
> 
> So, do you think we can simply fix this issue by calling ioremap_wc() for the
> frame buffer/vram range in bochs driver, as is commonly done in other gpu
> drivers?
> 
> --- a/drivers/gpu/drm/tiny/bochs.c
> +++ b/drivers/gpu/drm/tiny/bochs.c
> @@ -261,7 +261,9 @@ static int bochs_hw_init(struct drm_device *dev)
>         if (pci_request_region(pdev, 0, "bochs-drm") != 0)
>                 DRM_WARN("Cannot request framebuffer, boot fb still active?\n");
> 
> -       bochs->fb_map = ioremap(addr, size);
> +       bochs->fb_map = ioremap_wc(addr, size);
>         if (bochs->fb_map == NULL) {
>                 DRM_ERROR("Cannot map framebuffer\n");
>                 return -ENOMEM;
> 
> 
> > 
> > > > WIP to find out why effective UC in fb_map range will make gdm to restart
> > > > endlessly.
> > > Not sure whether it's simply because UC is too slow.
> > > 
> > > T=Test execution time of a selftest in which guest writes to a GPA for
> > >   0x1000000UL times
> > > 
> > >               | Sapphire Rapids XCC  | Coffee Lake-S
> > > --------------|----------------------|-----------------
> > > KVM UC+IPAT   |    T=0m4.530s        |  T=0m0.622s
> > 
> > Woah.  Have you tried testing MOVDIR64 and/or WT?  E.g. to see if the problem is
> > with UC specifically, or if it occurs with any accesses that immediately write
> > through to main memory.
> > 
> > > --------------|----------------------|-----------------
> > > KVM WC+IPAT   |    T=0m0.149s        |  T=0m0.176s
> > > --------------|----------------------|-----------------
> > > KVM WB+IPAT   |    T=0m0.148s        |  T=0m0.148s
> > > ------------------------------------------------------
> 
> I re-run all the tests and collected an averaged data (10 times each) as
> below (previous data was just a single-run score):
> 
> 
> T=Test execution time of a selftest in which guest writes to a GPA for
>   0x1000000UL times with WRITE_ONCE
> 
> KVM memtype  | Sapphire Rapids XCC | Coffee Lake-S
> -------------|---------------------|----------------
>  WB+IPAT     |     T=0.1511s       |    T=0.1661s
> -------------|---------------------|----------------
>  WC+IPAT     |     T=0.1411s       |    T=0.1656s
> -------------|---------------------|----------------
>  WT+IPAT     |     T=3.7527s       |    T=0.6156s
> -------------|---------------------|----------------
>  WP+IPAT     |     T=4.4663s       |    T=0.6203s
> -------------|---------------------|----------------
>  UC+IPAT     |     T=3.4632s       |    T=0.5868s
> 
> 
> T=Test execution time of a selftest in which guest writes to a GPA for
>   0x1000000UL times with movdir64b.
> 
> (Coffee Lake-S has no feature movdir64).
> 
> KVM memtype  | Sapphire Rapids XCC | Coffee Lake-S
> -------------|---------------------|----------------
>  WB+IPAT     |     T=2.6142s       |       /     
> -------------|---------------------|----------------
>  WC+IPAT     |     T=2.8919s       |       /     
> -------------|---------------------|----------------
>  WT+IPAT     |     T=3.0966s       |       /      
> -------------|---------------------|----------------
>  WP+IPAT     |     T=2.4933s       |       /     
> -------------|---------------------|----------------
>  UC+IPAT     |     T=3.4606s       |       /     
>
Up to now, I think I have root caused this issue.

Status before this update:
In either ubuntu or centos, on "Sapphire Rapids XCC"
- gdm fails to launch gnome-shell when wayland is enabled, when
  effective memory type is UC/UC-.
- gdm is able launch gnome-shell correctly when wayland is enabled, when
  effective memory type is WB or WC.
- gdm is able launch gnome-shell correctly when wayland is not enabled, with
  any effective memory type.


Update:
1. I tried KVM memtype = WT + IPAT for this framebuffer range,
   gdm fails to launch gnome-shell when wayland is enabled.

   Since the only difference between WT and WB is that write in WT is slow,
   the failure should not be self-snoop issue.


2. The current bochs driver calls ioremap() to map framebuffer range.
   On x86 architectures, ioremap() maps VA with PAT=UC- and invokes
   memtype_reserve() to reserve the memory type as UC- for the physical range.
   This reservation can cause subsequent calls to ioremap_wc() to fail to map
   the VA with PAT=WC to the same framebuffer range in
   ttm_kmap_iter_linear_io_init().
   Consequently, the operation drm_gem_vram_bo_driver_move() become
   significantly slow on platforms where UC memory access is slow.

   When host KVM honors guest PAT memory types, the effective memory type        
   for this framebuffer range is                                                    
   - WC when ioremap_wc() is used in driver probing phase                           
   - UC- when ioremap() is used.

   I measured the data below for drm_gem_vram_bo_driver_move() which
   does memset to this framebuffer range with size 0x3e8000.

     ---------------------------------------------------------------
    |                               |      in bochs_hw_init()       |
    |                               |    ioremap()   | ioremap_wc() |
    |-------------------------------|----------------|--------------|
    |     cycles of                 |    2227.4M     |   17.8M      |
    | drm_gem_vram_bo_driver_move() |                |              |
    |-------------------------------|----------------|--------------|
    |     time of                   |    1.24s       |   0.01s      |
    | drm_gem_vram_bo_driver_move() |                |              |
     ---------------------------------------------------------------

    drm_gem_vram_bo_driver_move
       ttm_bo_move_memcpy()
           ttm_kmap_iter_linear_io_init()
	       iosys_map_set_vaddr_iomem(&iter_io->dmap,
	                                 ioremap_wc(mem->bus.offset,mem->size));
	   ttm_move_memcpy
	       memset_io or
	       drm_memcpy_from_wc


   If I comment out the memset_io() and drm_memcpy_from_wc() in
   ttm_move_memcpy(), drm_gem_vram_bo_driver_move() can be very fast and gdm is
   able to launch gnome-shell and login successfully, though sometime the
   screen is a little blurred. 

3. I sent a fix at [1] to let guest bochs driver map the framebuffer
   with PAT=WC for kernel access.
   [1] https://lore.kernel.org/all/20240909051529.26776-1-yan.y.zhao@xxxxxxxxx/




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux