Re: prospective 13/12 s390 pgtable_list patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 23 Jun 2023, Alexander Gordeev wrote:
> On Thu, Jun 22, 2023 at 10:49:43PM -0700, Hugh Dickins wrote:
> > Hi Gerald,
> > 
> > It's that moment you've been dreading: I'm hoping that you can, please,
> > take a look at the patch below, and try building and running with it,
> > on top of the v2 series of 12 I sent out on Tuesday.
> > 
> > If this seems okay to you, I'll send it properly as 13/12 of that series,
> > to the full Cc list; but of course you may find I've missed typos and worse
> > - please don't waste your time on it if it's rubbish, but so far as I can
> > tell, it is complete and ready now.
> > 
> > Thanks!
> > Hugh
> 
> Hi Hugh,
> 
> Gerald is off until Monday and I think is not able to answer right now.

Thanks a lot for stepping in, Alexander.

> 
> We had discussions with regard to how to better approach your series and
> did not come to a conclusion unfortunatelly.
> 
> Gerald had several concerns - one of them is global mm_pgtable_list_lock,
> wich is luckily avoided with this follow-up patch. But there were others,
> which I am not able to articulate in detail.
> 
> While you are doing an outstanding job in trying to adjust our fragmented
> page tables reuse scheme, one of the ideas emerged was to actually give it
> up: partially or may be even fully. That is - not to reuse page tables
> returned via pte_free_defer() or not to reuse them at all. To assess this
> possible new approaches some statistics is needed and am working on a
> prototype that would allow collecting it.
> 
> Note, our existing code is extremly complicated already and we had hard
> time fixing (at least one) ugly race related to that. With the changes
> you suggest that complexity (to me personally) multiplies. But that well
> could be the other way around and I am just not smart enough to grasp it.
> At least the claim that page_table_free() no longer needs the two-step
> release indicates that.

Yes, I had to cool myself down to a low temperature, and think about it
for several hours before arriving at that conclusion.  It would be nice
if I could point to one fact which explains it convincingly (something
along the lines of "look, we already took it off the list in that case");
but didn't manage to put it into words, and ended up deciding that we
each have to do our own thinking to convince ourselves on that issue.

> 
> I am sorry that it is probably not the status you would like to hear,

Not a problem for me at all: you're absolutely right to question whether
the existing, and the added, complexity is worth it - all of us who have
looked into that code have wondered the same.

Initially I refused to even try to get into it; but in the end was quite
proud that I had, given time, managed to work with it.  But no problem
if that work is quickly discarded in favour of simplicity.

> but I still wonder what is your opinion on that do-not-reuse-fragments

I don't want to sway you one way or the other on that: I just want to
work with whatever you decide.  I know Matthew Wilcox would prefer if
powerpc and s390 went about things in the same way (but they do have
different constraints, so I don't necessarily agree); but I did not
feel able to persuade powerpc to adopt s390's more complex approach,
nor to persuade s390 to downgrade to powerpc's simpler approach.

> approach? Would it simplify pte_free_defer() or had no effect?

It would certainly simplify it a lot.  Whether it would just be a
matter of deleting my attempt to list_add() from __tlb_remove_table(),
so restoring the per-mm lock, and forgetting the SLAB_TYPESAFE_BY_RCU;
or whether it would go further, and most of the PP-AA bit tracking fall
away, I cannot predict - depends rather on who does the job, and how
far they choose to take it.

Two notes I want to throw into the mix:

One, FWIW my guess is that for most mms, the s390 PP-AA tracking adds very
little value; but without it, perhaps there is some mm, some workload, which
degrades to the point of using only half of the pgtable memory it allocates.

Two, maybe you'll see this as a further complication, to avoid getting into,
rather than a simplification: but it occurred to me while working here that
s390 has no need to keep the pgtable_trans_huge_deposit/withdraw() code in
mm/pgtable.c separate from the mm/pgalloc.c code.  They can use just the
one list, and deposit/withdraw simply keep a count of how many immediately
usable free pgtables must always be kept in reserve on that list.  (But
this would not be true at all for powerpc.)

> 
> Anyway, that is just another option and I will try your patch.

Thank you, please do, if you have the time: my series does need some
kind of s390 solution, depending on whatever s390 has in its tree at
the time: for now I should at least be showing the 13/12 (preferably
known to build and appearing to work!), even if all it does is help
people to say "no, that's too much".

Hugh



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux