Adding Ankit in case he has opinions. On Tue, Aug 27, 2024 at 5:42 PM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > > On Tue, Aug 27, 2024 at 3:57 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Tue, Aug 27, 2024 at 03:36:07PM -0700, Jiaqi Yan wrote: > > > Hi Peter, > > > > Hi, Jiaqi, > > > > > I am curious if there is any work needed for unmap_mapping_range? If a > > > driver hugely remap_pfn_range()ed at 1G granularity, can the driver > > > unmap at PAGE_SIZE granularity? For example, when handling a PFN is > > > > Yes it can, but it'll invoke the split_huge_pud() which default routes to > > removal of the whole pud right now (currently only covers either DAX > > mappings or huge pfnmaps; it won't for anonymous if it comes, for example). > > > > In that case it'll rely on the driver providing proper fault() / > > huge_fault() to refault things back with smaller sizes later when accessed > > again. > > I see, so the driver needs to drive the recovery process, and code > needs to be in the driver. > > But it seems to me the recovery process will be more or less the same > to different drivers? In that case does it make sense that > memory_failure do the common things for all drivers? > > Instead of removing the whole pud, can driver or memory_failure do > something similar to non-struct-page-version of split_huge_page? So > driver doesn't need to re-fault good pages back? > > > > > > > poisoned in the 1G mapping, it would be great if the mapping can be > > > splitted to 2M mappings + 4k mappings, so only the single poisoned PFN > > > is lost. (Pretty much like the past proposal* to use HGM** to improve > > > hugetlb's memory failure handling). > > > > Note that we're only talking about MMIO mappings here, in which case the > > PFN doesn't even have a struct page, so the whole poison idea shouldn't > > apply, afaiu. > > Yes, there won't be any struct page. Ankit proposed this patchset* for > handling poisoning. I wonder if someday the vfio-nvgrace-gpu-pci > driver adopts your change via new remap_pfn_range (install PMD/PUD > instead of PTE), and memory_failure_pfn still > unmap_mapping_range(pfn_space->mapping, pfn << PAGE_SHIFT, PAGE_SIZE, > 0), can it somehow just work and no re-fault needed? > > * https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@xxxxxxxxxx/#t > > > > > > > Thanks, > > > > -- > > Peter Xu > >