On Wed, May 10, 2023 at 09:35:44PM -0700, Hugh Dickins wrote: > On Wed, 10 May 2023, Matthew Wilcox wrote: > > On Tue, May 09, 2023 at 09:39:13PM -0700, Hugh Dickins wrote: > > > Two: pte_offset_map() will need to do an rcu_read_lock(), with the > > > corresponding rcu_read_unlock() in pte_unmap(). But most architectures > > > never supported CONFIG_HIGHPTE, so some don't always call pte_unmap() > > > after pte_offset_map(), or have used userspace pte_offset_map() where > > > pte_offset_kernel() is more correct. No problem in the current tree, > > > but a problem once an rcu_read_unlock() will be needed to keep balance. > > > > Hi Hugh, > > > > I shall have to spend some time looking at these patches, but at LSFMM > > just a few hours ago, I proposed and nobody objected to removing > > CONFIG_HIGHPTE. I don't intend to take action on that consensus > > immediately, so I can certainly wait until your patches are applied, but > > if this information simplifies what you're doing, feel free to act on it. > > Thanks a lot, Matthew: very considerate, as usual. > > Yes, I did see your "Whither Highmem?" (wither highmem!) proposal on the I'm glad somebody noticed the pun ;-) > list, and it did make me think, better get these patches and preview out > soon, before you get to vanish pte_unmap() altogether. HIGHMEM or not, > HIGHPTE or not, I think pte_offset_map() and pte_unmap() still have an > important role to play. > > I don't really understand why you're going down a remove-CONFIG_HIGHPTE > route: I thought you were motivated by the awkardness of kmap on large > folios; but I don't see how removing HIGHPTE helps with that at all > (unless you have a "large page tables" effort in mind, but I doubt it). Quite right, my primary concern is filesystem metadata; primarily directories as I don't think anybody has ever supported symlinks or superblocks larger than 4kB. I was thinking that removing CONFIG_HIGHPTE might simplify the page fault handling path a little, but now I've looked at it some more, and I'm not sure there's any simplification to be had. It should probably use kmap_local instead of kmap_atomic(), though. > But I've no investment in CONFIG_HIGHPTE if people think now is the > time to remove it: I disagree, but wouldn't miss it myself - so long > as you leave pte_offset_map() and pte_unmap() (under whatever names). > > I don't think removing CONFIG_HIGHPTE will simplify what I'm doing. > For a moment it looked like it would: the PAE case is nasty (and our > data centres have not been on PAE for a long time, so it wasn't a > problem I had to face before); and knowing pmd_high must be 0 for a > page table looked like it would help, but now I'm not so sure of that > (hmm, I'm changing my mind again as I write). > > Peter's pmdp_get_lockless() does rely for complete correctness on > interrupts being disabled, and I suspect that I may be forced in the > PAE case to do so briefly; but detest that notion. For now I'm just > deferring it, hoping for a better idea before third series finalized. > > I mention this (and Cc Peter) in passing: don't want this arch thread > to go down into that rabbit hole: we can start a fresh thread on it if > you wish, but right now my priority is commit messages for the second > series, rather than solving (or even detailing) the PAE problem. I infer that what you need is a pte_access_start() and a pte_access_end() which look like they can be plausibly rcu_read_lock() and rcu_read_unlock(), but might need to be local_irq_save() and local_irq_restore() in some configurations? We also talked about moving x86 to always RCU-free page tables in order to make accessing /proc/$pid/smaps lockless. I believe Michel is going to take a swing at this project.