Re: [PATCH] perf: map pages in advance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 29, 2024 at 03:09:49PM +0100, David Hildenbrand wrote:
> > > > I just tested it and write() works fine, it uses uaccess afaict as part of the
> > > > lib/iov_iter.c code:
> > > >
> > > > generic_perform_write()
> > > > -> copy_folio_from_iter_atomic()
> > > > -> copy_page_from_iter_atomic()
> > > > -> __copy_from_iter()
> > > > -> copy_from_user_iter()
> > > > -> raw_copy_from_user()
> > > > -> copy_user_generic()
> > > > -> [uaccess asm]
> > > >
> > >
> > > Ah yes. O_DIRECT is the problematic bit I suspect, which will use GUP.
> > >
> > > Ptrace access and friends should also no longer work on these pages, likely
> > > that's tolerable.
> >
> > Yeah Peter can interject if not, but I'd be _very_ surprised if anybody expects
> > to be able to ptrace perf counter mappings in another process (it'd be kind of
> > totally insane to do that anyway since it's a ring buffer that needs special
> > handing anyway).
>
> I think so as well. Disallowing GUP has some side effects, like not getting
> these pages included in a coredump etc ... at least I hope nobody uses
> O_DIRECT on them.

We set VM_DONTDUMP anyway (set by remap_pfn_range() also) so this part won't be
a problem, and I can't see anybody using O_DIRECT on them, sensibly.

>
> >
> > >
> > > > > >
> > > > > > If we can't do pfnmap, and we definitely can't do mixedmap (because it's
> > > > > > basically entirely equivalent in every way to just faulting in the pages as
> > > > > > before and requires the same hacks) then I will have to go back to the drawing
> > > > > > board or somehow change the faulting code.
> > > > > >
> > > > > > This really sucks.
> > > > > >
> > > > > > I'm not quite sure I even understand why we don't allow GUP used _just for
> > > > > > pinning_ on VM_PFNMAP when it is -in effect- already pinned on assumption
> > > > > > whatever mapped it will maintain the lifetime.
> > > > > >
> > > > > > What a mess...
> > > > >
> > > > > Because VM_PFNMAP is dangerous as hell. To get a feeling for that (and also why I
> > > > > raised my refcounting comment earlier) just read recent:
> > > > >
> > > > > commit 79a61cc3fc0466ad2b7b89618a6157785f0293b3
> > > > > Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > > > > Date:   Wed Sep 11 17:11:23 2024 -0700
> > > > >
> > > > >       mm: avoid leaving partial pfn mappings around in error case
> > > > >       As Jann points out, PFN mappings are special, because unlike normal
> > > > >       memory mappings, there is no lifetime information associated with the
> > > > >       mapping - it is just a raw mapping of PFNs with no reference counting of
> > > > >       a 'struct page'.
> > > > >
> > > >
> > > > I'm _very_ aware of this, having worked extensively on things around this kind
> > > > of issue recently (was a little bit involved with the above one too :), and
> > > > explicitly zap on error in this patch to ensure we leave no broken code paths.
> > > >
> > > > I agree it's horrible, but we need to have a way of mapping this memory without
> > > > having to 'trick' the faulting mechanism to behave correctly.
> > >
> > > What's completely "surprising" to me is, if there is no page_mkwrite, but
> > > the VMA is writable, then just map the PTE writable ...
> >
> > I've had a number of surprises on this journey :)
> >
> > To make sure I understand what you mean do you mean the whole:
> >
> > do_wp_page()
> > -> wp_page_shared()
> > -> if (page_mkwrite) { ... } else { wp_page_reuse(); }
> > -> wp_page_reuse()
> > -> maybe_mkwrite() [hey VM_WRITE is set, so yes make writable!]
> >
> > Path?
>
> Yes. I can see how it might be relevant (mprotect(PROT_READ) +
> mprotect(READ|WRITE)), but it's a bit counter-intuitive ... to default to
> "writable".

Yeah, no this whole thing very much resembles a labyrinth with a minotaur at the
end, whose name is VM_PFNMAP :>)

>
> [...]
>
> > >
> >
> > So overall - given all the above and the fact that the alternatives are _even
> > worse_ (having to spuriously folio lock if that'll even work, totally
> > unnecessary and semantically incorrect folio-fication or a complete rework of
> > mapping) - while you might be sicked by this use of the VM_PFNMAP - are you ok
> > with the patch, all things considered? :)
>
> It hurts my eyes, and feels like a step into the wrong direction. But I
> don't really see how to do it differently right now.

Yeah I agree totally.

>
> Can we add a comment/TODO in there that using remap_pfn_range this to map a
> kernel allocation really is suboptimal and we should switch to something
> better once we figured the details out?

Sure, let me do a v2 then to make Peter's life easier, can do the nommu fixup in
there too.

>
>
> BTW, I think the whole reason vm_insert_pages() touches the mapcount is
> because we didn't really have a way to identify during unmap that we must
> not adjust it either. A page->type will be able to sort that out (while
> still allowing to refcount them for now).

Yeah. I think not being able to differentiate between different cases is the
problem here...

>
> Actually, the whole thing is on my todo list, because messing with the RMAP
> with vm_insert_pages() doesn't make any sense (whereby the refcount might!).
> See the TODO I added in __folio_rmap_sanity_checks().

How long is your TODO list I wonder? :)) I imagine it requires a huge page to
map at this point..

>
> --
> Cheers,
>
> David / dhildenb
>

Cool will respin and send out shortly with an appropriately 'forgive us father
for we have sinned' comment.

Thanks for taking the time to discuss this! Much appreciated.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux