RE: fbdev deferred I/O broken in some scenarios

Michael Kelley <mhklinux@xxxxxxxxxxx> · Wed, 19 Mar 2025 20:38:02 +0000

From: Thomas Zimmermann <tzimmermann@xxxxxxx> Sent: Tuesday, March 18, 2025 1:26 AM
> 
> Am 18.03.25 um 03:05 schrieb Michael Kelley:
> > I've been trying to get mmap() working with the hyperv_fb.c fbdev driver, which
> > is for Linux guests running on Microsoft's Hyper-V hypervisor. The hyperv_fb driver
> > uses fbdev deferred I/O for performance reasons. But it looks to me like fbdev
> > deferred I/O is fundamentally broken when the underlying framebuffer memory
> > is allocated from kernel memory (alloc_pages or dma_alloc_coherent).
> >
> > The hyperv_fb.c driver may allocate the framebuffer memory in several ways,
> > depending on the size of the framebuffer specified by the Hyper-V host and the VM
> > "Generation".  For a Generation 2 VM, the framebuffer memory is allocated by the
> > Hyper-V host and is assigned to guest MMIO space. The hyperv_fb driver does a
> > vmalloc() allocation for deferred I/O to work against. This combination handles mmap()
> > of /dev/fb<n> correctly and the performance benefits of deferred I/O are substantial.
> >
> > But for a Generation 1 VM, the hyperv_fb driver allocates the framebuffer memory in
> > contiguous guest physical memory using alloc_pages() or dma_alloc_coherent(), and
> > informs the Hyper-V host of the location. In this case, mmap() with deferred I/O does
> > not work. The mmap() succeeds, and user space updates to the mmap'ed memory are
> > correctly reflected to the framebuffer. But when the user space program does munmap()
> > or terminates, the Linux kernel free lists become scrambled and the kernel eventually
> > panics. The problem is that when munmap() is done, the PTEs in the VMA are cleaned
> > up, and the corresponding struct page refcounts are decremented. If the refcount goes
> > to zero (which it typically will), the page is immediately freed. In this way, some or all
> > of the framebuffer memory gets erroneously freed. From what I see, the VMA should
> > be marked VM_PFNMAP when allocated memory kernel is being used as the
> > framebuffer with deferred I/O, but that's not happening. The handling of deferred I/O
> > page faults would also need updating to make this work.
> 
> I cannot help much with HyperV, but there's a get_page callback in
> struct fb_deferred_io. [1] It'll allow you to provide a custom page on
> each page fault. We use it in DRM to mmap SHMEM-backed pages. [2] Maybe
> this helps with hyperv_fb as well.
> 

Thanks for your input. See also my reply to Helge.

Unfortunately, using a custom get_page() callback doesn't help. In the problematic
case, the standard deferred I/O get_page() function works correctly for getting the
struct page.  My current thinking is that the problem is in fb_deferred_io_mmap()
where the vma needs to have the VM_PFNMAP flag set when the framebuffer
memory is a direct kernel allocation and not through vmalloc(). And there may be
some implications on the mkwrite function as well, but I'll need to sort that out
once I start coding.

For the DRM code using SHMEM-backed pages, do you know where the shared
memory comes from? Is that ultimately a kernel vmalloc() allocation?

Michael