On Tue, Aug 24, 2021 at 11:25 AM Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> wrote: > > On Tue, 24 Aug 2021 07:53:22 -0700 > Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > > On Tue, Aug 24, 2021 at 7:10 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > > > > > > > > > > > On 8/23/21 9:21 PM, Dan Williams wrote: > > > > On Mon, Aug 23, 2021 at 12:47 PM Gerald Schaefer > > > > <gerald.schaefer@xxxxxxxxxxxxx> wrote: > > > >> > > > >> On Mon, 23 Aug 2021 16:05:46 +0200 > > > >> Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> wrote: > > > >> > > > >>> On Fri, 20 Aug 2021 07:43:40 +0200 > > > >>> Christoph Hellwig <hch@xxxxxx> wrote: > > > >>> > > > >>>> Hi all, > > > >>>> > > > >>>> looking at the recent ZONE_DEVICE related changes we still have a > > > >>>> horrible maze of different code paths. I already suggested to > > > >>>> depend on ARCH_HAS_PTE_SPECIAL for ZONE_DEVICE there, which all modern > > > >> > > > >> Oh, we do have PTE_SPECIAL, actually that took away the last free bit > > > >> in the pte. So, if there is a chance that ZONE_DEVICE would depend > > > >> on PTE_SPECIAL instead of PTE_DEVMAP, we might be back in the game > > > >> and get rid of that CONFIG_FS_DAX_LIMITED. > > > > > > > > So PTE_DEVMAP is primarily there to coordinate the > > > > get_user_pages_fast() path, and even there it's usage can be > > > > eliminated in favor of PTE_SPECIAL. I started that effort [1], but > > > > need to rebase on new notify_failure infrastructure coming from Ruan > > > > [2]. So I think you are not in the critical path until I can get the > > > > PTE_DEVMAP requirement out of your way. > > > > > > > > > > Isn't the implicit case that PTE_SPECIAL means that you > > > aren't supposed to get a struct page back? The gup path bails out on > > > pte_special() case. And in the fact in this thread that you quote: > > > > > > > [1]: https://lore.kernel.org/r/161604050866.1463742.7759521510383551055.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > (...) we were speaking about[1.1] using that same special bit to block > > > longterm gup for fs-dax (while allowing it device-dax which does support it). > > > > > > [1.1] https://lore.kernel.org/nvdimm/a8c41028-c7f5-9b93-4721-b8ddcf2427da@xxxxxxxxxx/ > > > > > > Or maybe that's what you mean for this particular case of FS_DAX_LIMITED. Most _special*() > > > cases in mm match _devmap*() as far I've experimented in the past with PMD/PUD and dax > > > (prior to [1.1]). > > > > > > I am just wondering would you differentiate the case where you have metadata for the > > > !FS_DAX_LIMITED case in {gup,gup_fast} path in light of removing PTE_DEVMAP. I would have > > > thought of checking that a pgmap exists for the pfn (without grabbing a ref to it). > > > > So I should clarify, I'm not proposing removing PTE_DEVMAP, I'm > > proposing relaxing its need for architectures that can not afford the > > PTE bit. Those architectures would miss out on get_user_pages_fast() > > for devmap pages. Then, once PTE_SPECIAL kicks get_user_pages() to the > > slow path, get_dev_pagemap() is used to detect devmap pages. > > Thanks, I was also a bit confused, but I think I got it now. Does that mean > that you also plan to relax the pte_devmap(pte) check in follow_page_pte(), > before calling get_dev_pagemap() in the slow path? So that it could also be > called for pte_special(), maybe with additional vma_is_dax() check. And then > rely on get_dev_pagemap() finding the pages for those "very special" PTEs that > actually would have struct pages (at least for s390 DCSS with DAX)? Yes, that's along the lines of what I'm thinking. I.e don't expect pte_devmap() to be there in the slow path, and use the vma to check for DAX.