Re: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 09, 2025 at 04:32:13PM +0100, Andreas Hindborg wrote:
> "Lorenzo Stoakes" <lorenzo.stoakes@xxxxxxxxxx> writes:
>
> > On Thu, Jan 09, 2025 at 10:50:13AM +0100, Andreas Hindborg wrote:
> >> "Lorenzo Stoakes" <lorenzo.stoakes@xxxxxxxxxx> writes:
> >>
> >> > On Thu, Jan 09, 2025 at 09:02:11AM +0100, Andreas Hindborg wrote:
> >> >> "Alice Ryhl" <aliceryhl@xxxxxxxxxx> writes:
> >> >>
> >> >> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@xxxxxxxxxx> wrote:
> >> >> >>
> >> >> >>
> >> >> >> > +
> >> >> >> > +    /// Zap pages in the given page range.
> >> >> >> > +    ///
> >> >> >> > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> >> >> >> > +    /// tables intact,
> >> >> >>
> >> >> >> I don't fully understand this docstring. Is it correct that the function
> >> >> >> will unmap the address range given by `start` and `size`, _and_ free the
> >> >> >> pages used to hold the mappings at the leaf level of the page table?
> >> >> >
> >> >> > If the vma owns a refcount on those pages, then the refcounts are dropped.
> >> >>
> >> >> Maybe drop the "at the leaf level leaving all other page tables intact".
> >> >> It confuses me, since when would this not be the case?
> >> >
> >> > I don't understand your objection. The whole nature of a zap is to traverse
> >> > leaf level page table mappings, clearing the entries, leaving the other
> >> > page table entries intact.
> >>
> >> As someone not deeply familiar with this function and it's use, I became
> >> uncertain of my understanding when I read this sentence. As I asked
> >> above: When would you not clear mappings at the leaf level and leave all
> >> other mappings alone?
> >
> > Because these are page tables and page tables can span multiple PTE
> > tables. Correctly removing at the time of clearing would be expensive and
> > require very careful handling.
>
> What is the distinction between clearing a PTE and removing it?
>
> I asked above if the leaf page holding the PTEs would be dropped if all
> the PTEs it holds are cleared. Alice "If the vma owns a refcount on those pages,
> then the refcounts are dropped.". But from your message I am guessing
> maybe not?
>

No they won't be, though Qi is implementing a series which changes
this :) but for the purposes of this function, assume not.

> >
> >>
> >> Imagine you have a collection structure backed by a tree and the
> >> `remove_item` function has the sentence "remove item at the leaf level
> >> but leave all other items in the collection alone". That would be over
> >> specifying. It is enough information in the user facing documentation
> >> that the item is removed. You don't need to state that a remove
> >> operation on an item does not remove other items. Does this example
> >> transfer to this function, or am I missing something?
> >
> > No, because we're dealing with page tables and you are explicitly requesting a
> > page table operation. Knowing what is touched is meaningful.
>
> When would a page table operation to remove (clear?) the PTEs
> corresponding to an address range touch PTEs corresponding to addresses
> outside of the range?

Well we clear PTE entries (yes PTE is _terribly named_) in the specified
range, which might span entire PTE tables maybe not.

It also might span higher level tables if you are zapping huge pages, but
in that instance the PMD (or even PUD) would be the leaf table.

You don't touch PTE _entries_ corresponding to addresses outside of the
range.

>
> >
> >>
> >> > That is, precisely what is written here. In fact I think this
> >> > characterisation is derived from discussions had with us in mm, and it is
> >> > one with which I am happy.
> >> >
> >> > Why is it problematic to accurately describe what this does?
> >>
> >> Again, it might be that I don't properly understand what the function
> >> actually does, but if it is just removing the entries described by the
> >> range - write that. Don't add irrelevant details or specify what the
> >> function does not do. It slows down the user when reading documentation.
> >
> > It is highly pertinent as mentioned above.
> >
> > I mean we can expand the comment to explicitly add some detail around this
> > since obviously this is confusing (hey - a lot of mm is confusing - this is
> > an ongonig problem and why I have gone to lengths to try to improve
> > documentation and wrote a book about it :)
>
> That would be nice :)

Yes indeed!

>
> >
> >>
> >> >
> >> > For a series at v11 where there is broad agreement with maintainers within
> >> > the subsystem which it wraps, perhaps the priority should be to try to have
> >> > the series merged unless there is significant technical objection from the
> >> > rust side?
> >> >
> >> >>
> >> >> How about this:
> >> >>
> >> >> This clears the virtual memory map for the range given by `start` and
> >> >> `size`, dropping refcounts to memory held by the mappings in this range. That
> >> >> is, anonymous memory is completely freed, file-backed memory has its
> >> >> reference count on page cache folio's dropped, any dirty data will still
> >> >> be written back to disk as usual.
> >> >
> >> > Sorry I object to this, 'clears the virtual memory map' is really
> >> > vague. What is already there is better.
> >>
> >> Would you like the proposed paragraph if we replaced "virtual memory
> >> map" with "page table mappings", or do you object to the entirety of the
> >> new suggestion?
> >
> > I object to the suggestion in general. The description is fine as it is.
>
> Ok. I'm raising a flag because I had more questions after reading the
> docstring than before.

Sure and so I think this is valuable information, and indicates it's
probably worthwhile adding a little extra information on mentioning page
tables.

>
> >
> >>
> >> >
> >> >>
> >> >> >
> >> >> >> > and freeing any memory referenced by the VMA in this range. That is,
> >> >> >> > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> >> >> >> > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> >> >> >> > +    #[inline]
> >> >> >> > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Andreas Hindborg
> >> >>
> >> >>
> >> >
> >> > Let's please get this series merged. I think Alice has demonstrated
> >> > remarkable patience already, and modulo significant technical pushback on
> >> > the rust side (on which I defer entirely to the expertise of rust people),
> >> > I want to see this go in.
> >>
> >> I am sensing that you don't feel my comments are relevant at the current
> >> stage of this series (v11). Alice asked for reviews of the series. These are my
> >> comments. Feel free do whatever you want with them.
> >
> > I think you're getting the wrong end of the stick - you are making comments
> > on something relevant to mm, as an mm maintainer I'm giving you my point of
> > view.
>
> I appreciate that.

Thanks

>
> >
> > Your comments elsewhere seem highly useful, and review is always
> > appreciated, if you read what I said above - I defer entirely to the rust
> > community on things of which you are expert - so there is clearly no
> > disrespect intended.
>
> I did not read any disrespect in your message. I understand if you think
> I am late at the party at v11. Normally I would not pick up review of a
> series that late.

Ah OK, good :) I just wanted to make sure things were clear, text is a poor
medium and things can get misinterpreted :) I very much appreciate your
review!

>
> >
> > I'd also ask you to respect that I have gone to great lengths to review
> > this series from mm side, motivated by a strong desire to help the rust
> > commnuity.
>
> I absolutely appreciate that!

Cool :) I am excited about rust's potential in the kernel, as I know you
are and you know I suspect _probably_ Alice is somewhat :P so I think we're
all on the same page.

>
> >
> > So where I am coming from is nothing negative, quite the opposite, I simply
> > feel _on this issue_ it is not worth holding up the series for.
> >
> > This is no way intended to do down, disrespect or seem ungrateful for your
> > review or efforts. Apologies if it seemed that way, was not the intent.
> >
> > And to reiterate what I said above - I want to see this series merge :) so
> > there is no ill will anywhere.
>
> We can always merge this as is and then discuss the finer points of
> documentation later - I am fine with that. But obviously I cannot put my
> review tag on it, when I don't understand the semantics of the functions
> from reading the documentation strings. Perhaps we have someone who is
> more well versed in mm that can.

Ack of course, I would never ask you to tag anything you're not comfortable
with.

I think probably we can agree that adding extra detail to the comment
should suffice to address your concerns right?

>
> >
> >>
> >>
> >> Best regards,
> >> Andreas Hindborg
> >>
> >
> > Perhaps the correct approach here, as alluded above, is for Alice to add an
> > extra commentary pointing out the role of page tables here?
>
> That would be nice. Perhaps a bit of module level documentation is also
> a good addition.

Ack

>
> >
> > To complicate matters further (of course) there are recent series which
> > actually _do_ unused clean up page tables, though not (I believe... I have
> > to check...) on zap. But of course we in mm JUST LOVE to complicate
> > everything... ;)
>
> We should make sure to document that :)

Yeah, I have added documentation around VMA locking and page tables which
relates to this, I can expand as needed depending on this series, when I
finally get round to properly looking at it...

See https://origin.kernel.org/doc/html/latest/mm/process_addrs.html for the
doc, which is being updated constantly also.
>
>
> Best regards,
> Andreas Hindborg
>
>

Cheers!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux