On Sat, Mar 13, 2021 at 11:23:40AM -0800, Linus Torvalds wrote: > On Fri, Mar 12, 2021 at 9:08 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > Subject: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers > > This patch can not possibly be correct, and is dangerous and very very wrong. > > > out: > > - unlock_page(head); > > + if (PageLocked(head)) > > + unlock_page(head); > > You absolutely CANNOT do things like this. It is fundamentally insane. > > You have two situations: > > (a) you know you locked the page. > > In this case an unconditional "unlock_page()" is the only correct > thing to do. > > (b) you don't know whether you maybe unlocked it again. > > In this case SOMEBODY ELSE might have locked the page, and you > doing "if it's locked, then unlock it" is pure and utter drivel, and > fundamentally and seriously wrong. You're unlocking SOMEBODY ELSES > lock, after having already unlocked your own once. > > Now, it is possible that this kind of pattern could be ok if you > absolutely 100% know - and it is obvious from the code - that you are > the only thread that can possibly access the page. But if that is the > case, then the page should never have been locked in the first place, > because that would be an insane and pointless operation, since the > whole - and only - point of locking is to enforce some kind of > exclusivity - and if exclusivity is explicit in the code-path, then > locking the page is pointless. > > And yes, in this case I can see a remote theoretical model: "all good > cases keep the page locked, and the only trhing that unlocks the page > also guarantees nothing else can possiblly see it". > > But no. I don't actually believe this is fuaranteed to the case here, > and even if it were, I don't think this is a code sequence we can or > should accept. > > End result: there is no possible situation that I can think of where the above > > if (PageLocked(head)) > unlock_page(head); > > kind of sequence can EVER possibly be correct, and it shows a complete > disregard of everything that locking is all about. > > Now, the memory error handling may be so special that this effectively > "works" in practice, but there is no way I can apply such a > fundamentally broken patch. OK, I'll update the patch with the second approach you mentioned below. > > I see two options: > > - make the result of identify_page_state() *tell* you whether the > page was already unlocked (and then make the unlock conditional on > *that*) > > This is valid, because now you removed that whole "somebody else > might have transferred the lock" issue: you know you already unlocked > the page based on the return value, so you don't unlock it again. > That's perfectly valid. > > - probably better: make identify_page_state() itself always unlock > the page, and turn the two different code sequences that currently do > > res = identify_page_state(pfn, p, page_flags); > out: > unlock_page(head); > return res; > > into just doing > > return identify_page_state(pfn, p, page_flags); > out: > unlock_page(head); > return -EBUSY; > > instead, and move that now pointless "res" variable into the only > if-statement that actually uses it (for other things entirely! It > should be a "enum mf_result" there!) > > And yes, that second (and imho better) rule means that now > {page_action()" itself needs to be the thing that unlocks the page, > and that in turn might be done a few different ways: > > (a) either add a separate "MF_UNLOCKED" state bit to the possible > return codes and make that me_huge_page() that unlocks the page set > that bit (NOTE! It still needs to also return either MF_FAILED or > MF_RECOVERED, so this "MF_UNLOCKED" bit really should be a separate > bitmask entirely from the other MF_xyz bits) > > (b) make the rule be that *all* the ->action() functions need to just > unlock the page. > > ( suspect (b) is the better model to aim for, because honestly, static > code rules for locking are basically almost always superior to dynamic > "based on this flag" locking rules. You can in theory actually have > static tooling check that the locking nests correctly with the > unlocking that way (and it's just conceptually simpler to have a hard > rule about "this function always unlocks the page"). I'll try (b). Thank you for the detailed advices. - Naoya Horiguchi > > But that existing patch is *so* fundamentally wrong that I cannot > possibly apply it, and I feel bad about the fact that it has made it > all the way to me with sign-offs and reviewed-by's and everything. > > Linus >