Re: [PATCH v1 03/14] mm: add iomem vma selection for memory migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-09-01 6:03 p.m., Dave Chinner wrote:
On Wed, Sep 01, 2021 at 11:40:43AM -0400, Felix Kuehling wrote:
Am 2021-09-01 um 4:29 a.m. schrieb Christoph Hellwig:
On Mon, Aug 30, 2021 at 01:04:43PM -0400, Felix Kuehling wrote:
driver code is not really involved in updating the CPU mappings. Maybe
it's something we need to do in the migration helpers.
It looks like I'm totally misunderstanding what you are adding here
then.  Why do we need any special treatment at all for memory that
has normal struct pages and is part of the direct kernel map?
The pages are like normal memory for purposes of mapping them in CPU
page tables and for coherent access from the CPU.
That's the user page tables.  What about the kernel direct map?
If there is a normal kernel struct page backing there really should
be no need for the pgmap.
I'm not sure. The physical address ranges are in the UEFI system address
map as special-purpose memory. Does Linux create the struct pages and
kernel direct map for that without a pgmap call? I didn't see that last
time I went digging through that code.


 From an application
perspective, we want file-backed and anonymous mappings to be able to
use DEVICE_PUBLIC pages with coherent CPU access. The goal is to
optimize performance for GPU heavy workloads while minimizing the need
to migrate data back-and-forth between system memory and device memory.
I don't really understand that part.  file backed pages are always
allocated by the file system using the pagecache helpers, that is
using the page allocator.  Anonymouns memory also always comes from
the page allocator.
I'm coming at this from my experience with DEVICE_PRIVATE. Both
anonymous and file-backed pages should be migrateable to DEVICE_PRIVATE
memory by the migrate_vma_* helpers for more efficient access by our
GPU. (*) It's part of the basic premise of HMM as I understand it. I
would expect the same thing to work for DEVICE_PUBLIC memory.

(*) I believe migrating file-backed pages to DEVICE_PRIVATE doesn't
currently work, but that's something I'm hoping to fix at some point.
FWIW, I'd love to see the architecture documents that define how
filesystems are supposed to interact with this device private
memory. This whole "hand filesystem controlled memory to other
devices" is a minefield that is trivial to get wrong iand very
difficult to fix - just look at the historical mess that RDMA
to/from file backed and/or DAX pages has been.

So, really, from my perspective as a filesystem engineer, I want to
see an actual specification for how this new memory type is going to
interact with filesystem and the page cache so everyone has some
idea of how this is going to work and can point out how it doesn't
work before code that simply doesn't work is pushed out into
production systems and then merged....

OK. To be clear, that's not part of this patch series. And I have no authority to push anything in this part of the kernel, so you have nothing to fear. ;)

FWIW, we already have the ability to map file-backed system memory pages into device page tables with HMM and interval notifiers. But we cannot currently migrate them to ZONE_DEVICE pages. Beyond that, my understanding of how filesystems and page cache work is rather superficial at this point. I'll keep your name in mind for when I am ready to discuss this in more detail.

Cheers,
  Felix



Cheers,

Dave.



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux