On Fri, May 03, 2024 at 02:43:19PM +0000, Zeng, Oak wrote: > > > 2. > > > Then call hmm_range_fault a second time > > > Setting the hmm_range start/end only to cover valid pfns > > > With all valid pfns, set the REQ_FAULT flag > > > > Why would you do this? The first already did the faults you needed and > > returned all the easy pfns that don't require faulting. > > But we have use case where we want to fault-in pages other than the > page which contains the GPU fault address, e.g., user malloc'ed or > mmap'ed 8MiB buffer, and no CPU touching of this buffer before GPU > access it. Let's say GPU access caused a GPU page fault a 2MiB > place. The first hmm-range-fault would only fault-in the page at > 2MiB place, because in the first call we only set REQ_FAULT to the > pfn at 2MiB place. Honestly, that doesn't make alot of sense to me, but if you really want that you should add some new flag and have hmm_range_fault do this kind of speculative faulting. I think you will end up significantly over faulting. It also doesn't make sense to do faulting in hmm prefetch if you are going to do migration to force the fault anyhow. > > > Basically use hmm_range_fault to figure out the valid address range > > > in the first round; then really fault (e.g., trigger cpu fault to > > > allocate system pages) in the second call the hmm range fault. > > > > You don't fault on prefetch. Prefetch is about mirroring already > > populated pages, it should not be causing new faults. > > Maybe there is different wording here. We have this scenario we call > it prefetch, or whatever you call it: > > GPU page fault at an address A, we want to map an address range > (e.g., 2MiB, or whatever size depending on setting) around address A > to GPU page table. The range around A could have no backing pages > when GPU page fault happens. We want to populate the 2MiB range. We > can call it prefetch because most of pages in this range is not > accessed by GPU yet, but we expect GPU to access it soon. This isn't prefetch, that is prefaulting. > You mentioned "already populated pages". Who populated those pages > then? Is it a CPU access populated them? If CPU access those pages > first, it is true pages can be already populated. Yes, I would think that is a pretty common case > But it is also a valid use case where GPU access address before CPU > so there is no "already populated pages" on GPU page fault. Please > let us know what is the picture in your head. We seem picture it > completely differently. And sure, this could happen too, but I feel like it is an application issue to be not prefaulting the buffers it knows the GPU is going to touch. Again, our experiments have shown that taking the fault path is so slow that sane applications must explicitly prefault and prefetch as much as possible to avoid the faults in the first place. I'm not sure I full agree there is a real need to agressively optimize the faulting path like you are describing when it shouldn't really be used in a performant application :\ > 2) decide a migration window per migration granularity (e.g., 2MiB) > settings, inside the CPU VMA. If CPU VMA is smaller than the > migration granular, migration window is the whole CPU vma range; > otherwise, partially of the VMA range is migrated. Seems rather arbitary to me. You are quite likely to capture some memory that is CPU memory and cause thrashing. As I said before in common cases the heap will be large single VMAs, so this kind of scheme is just going to fault a whole bunch of unrelated malloc objects over to the GPU. Not sure how it is really a good idea. Adaptive locality of memory is still an unsolved problem in Linux, sadly. > > However, the migration stuff should really not be in the driver > > either. That should be core DRM logic to manage that. It is so > > convoluted and full of policy that all the drivers should be working > > in the same way. > > Completely agreed. Moving migration infrastructures to DRM is part > of our plan. We want to first prove of concept with xekmd driver, > then move helpers, infrastructures to DRM. Driver should be as easy > as implementation a few callback functions for device specific page > table programming and device migration, and calling some DRM common > functions during gpu page fault. You'd be better to start out this way so people can look at and understand the core code on its own merits. Jason