On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote: > >>> It looks like I'm totally misunderstanding what you are adding here > >>> then. Why do we need any special treatment at all for memory that > >>> has normal struct pages and is part of the direct kernel map? > >> The pages are like normal memory for purposes of mapping them in CPU > >> page tables and for coherent access from the CPU. > > That's the user page tables. What about the kernel direct map? > > If there is a normal kernel struct page backing there really should > > be no need for the pgmap. > > I'm not sure. The physical address ranges are in the UEFI system address > map as special-purpose memory. Does Linux create the struct pages and > kernel direct map for that without a pgmap call? I didn't see that last > time I went digging through that code. So doing some googling finds a patch from Dan that claims to hand EFI special purpose memory to the device dax driver. But when I try to follow the version that got merged it looks it is treated simply as an MMIO region to be claimed by drivers, which would not get a struct page. Dan, did I misunderstand how E820_TYPE_SOFT_RESERVED works? > >> From an application > >> perspective, we want file-backed and anonymous mappings to be able to > >> use DEVICE_PUBLIC pages with coherent CPU access. The goal is to > >> optimize performance for GPU heavy workloads while minimizing the need > >> to migrate data back-and-forth between system memory and device memory. > > I don't really understand that part. file backed pages are always > > allocated by the file system using the pagecache helpers, that is > > using the page allocator. Anonymouns memory also always comes from > > the page allocator. > > I'm coming at this from my experience with DEVICE_PRIVATE. Both > anonymous and file-backed pages should be migrateable to DEVICE_PRIVATE > memory by the migrate_vma_* helpers for more efficient access by our > GPU. (*) It's part of the basic premise of HMM as I understand it. I > would expect the same thing to work for DEVICE_PUBLIC memory. Ok, so you want to migrate to and from them. Not use DEVICE_PUBLIC for the actual page cache pages. That maks a lot more sense. > I see DEVICE_PUBLIC as an improved version of DEVICE_PRIVATE that allows > the CPU to map the device memory coherently to minimize the need for > migrations when CPU and GPU access the same memory concurrently or > alternatingly. But we're not going as far as putting that memory > entirely under the management of the Linux memory manager and VM > subsystem. Our (and HPE's) system architects decided that this memory is > not suitable to be used like regular NUMA system memory by the Linux > memory manager. So yes. It is a Memory Mapped I/O region, which unlike the PCIe BARs that people typically deal with is fully cache coherent. I think this does make more sense as a description. But to go back to what start this discussion: If these are memory mapped I/O pfn_valid should generally not return true for them. And as you already pointed out in reply to Alex we need to tighten the selection criteria one way or another.