On Thu, 29 Jun 2023 23:00:07 -0700 (PDT) Hugh Dickins <hughd@xxxxxxxxxx> wrote: > On Thu, 29 Jun 2023, Gerald Schaefer wrote: > > On Thu, 29 Jun 2023 12:22:24 -0300 > > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > On Wed, Jun 28, 2023 at 10:08:08PM -0700, Hugh Dickins wrote: > > > > On Wed, 28 Jun 2023, Gerald Schaefer wrote: > > > > > > > > > > As discussed in the other thread, we would rather go with less complexity, > > > > > possibly switching to an approach w/o the list and fragment re-use in the > > > > > future. For now, as a first step in that direction, we can try with not > > > > > adding fragments back only for pte_free_defer(). Here is an adjusted > > > > > version of your patch, copying most of your pte_free_defer() logic and > > > > > also description, tested with LTP and all three of your patch series applied: > > > > > > > > Thanks, Gerald: I don't mind abandoning my 13/12 SLAB_TYPESAFE_BY_RCU > > > > patch (posted with fewer Cc's to the s390 list last week), and switching > > > > to your simpler who-cares-if-we-sometimes-don't-make-maximal-use-of-page > > > > patch. > > > > > > > > But I didn't get deep enough into it today to confirm it - and disappointed > > > > that you've found it necessary to play with pt_frag_refcount in addition to > > > > _refcount and HH bits. No real problem with that, but my instinct says it > > > > should be simpler. > > > > Yes, I also found it a bit awkward, but it seemed "good and simple enough", > > to have something to go forward with, while my instinct was in line with yours. > > > > > > > > Is there any reason it should be any different at all from what PPC is > > > doing? > > > > > > I still think the right thing to do here is make the PPC code common > > > (with Hugh's proposed RCU modification) and just use it in both > > > arches.... > > > > With the current approach, we would not add back fragments _only_ for > > the new pte_free_defer() path, while keeping our cleverness for the other > > paths. Not having a good overview of the negative impact wrt potential > > memory waste, I would rather take small steps, if possible. > > > > If we later switch to never adding back fragments, of course we should > > try to be in line with PPC implementation. > > I find myself half-agreeing with everyone. > > I agree with Gerald that s390 should keep close to what it is already > doing (except for adding pte_free_defer()): that changing its strategy > and implementation to be much more like powerpc, is a job for some other > occasion (and would depend on gathering data about how well each does). > > But I agree with Jason that the powerpc solution we ended up with cut > out a lot of unnecessary complication: it shifts the RCU delay from > when pte_free_defer() is called, to when the shared page comes to be > freed; which may be a lot later, and might not be welcome in a common > path, but is quite okay for the uncommon pte_free_defer(). Ok, I guess I must admit that I completely ignored the latest progress in the powerpc thread, and therefore was not up-to-date. Still had the older approach in mind, where you also checked for pt_frag_refcount to avoid double call_rcu(). The new approach sounds very reasonable, and I also like your latest s390 patch from a first glance. Need to get more up-to-date with PageActive and maybe also powerpc approach, and give this some proper review tomorrow. > > And I agree with Alexander that pte_free_lower() and pte_free_upper() > are better names than pte_free_now0() and pte_free_now1(): I was going > to make that change, except all those functions disappear if we follow > Jason's advice and switch the call_rcu() to when freeing the page. > > (Lower and upper seem unambiguous to me: Gerald, does your confusion > come just from the way they are shown the wrong way round in the PP AA > diagram? I corrected that in my patch, but you reverted it in yours.) Ah yes, that could well be, and unfortunately I did not notice that you fixed that in the comment. I only saw that you "fixed" the bit numbering from 01234567 to 76543210, which I think is wrong on big-endian s390, and therefore I simply removed that complete hunk. But thanks a lot for pointing to that! We will certainly want to fix that comment in a later patch, to reduce some or maybe all of the (at least my) upper/lower confusion.