On Fri, 20 Oct 2023 at 07:52, Andy Shevchenko <andriy.shevchenko@xxxxxxxxx> wrote: > > # first bad commit: [e64db1c50eb5d3be2187b56d32ec39e56b739845] quota: factor out dquot_write_dquot() Ok, so commit 024128477809 ("quota: factor out dquot_write_dquot()") pre-rebase. Which honestly seems entirely innocuous, and the only change seems to be a slight massaging of the return value checking, in that it did a "if (err)" ine one place before, now it does "if (err < 0)". And the whole "now it always warns about errors", which used to happen only in dqput() before. Neither seems to be very relevant, which just reinforces that yes, this looks like a timing thing. > On top of the above I have tried the following: > 1) dropping inline, replacing it to __always_inline -- no help; > 2) commenting out the error message -- helps! > > --- a/fs/quota/dquot.c > +++ b/fs/quota/dquot.c > @@ -632,8 +632,10 @@ static inline int dquot_write_dquot(struct dquot *dquot) > { > int ret = dquot->dq_sb->dq_op->write_dquot(dquot); > if (ret < 0) { > +#if 0 > quota_error(dquot->dq_sb, "Can't write quota structure " > "(error %d). Quota may get out of sync!", ret); > +#endif > /* Clear dirty bit anyway to avoid infinite loop. */ > clear_dquot_dirty(dquot); > } The only thing quota_error() does is the varags handling and a printk, so yeah, all that #if 0" would do even if the error triggers (and it presumably doesn't) is to change code generation around that point, and change timing. But what *is* interesting is that that commit that triggers it is before all the other list-handling changes, so the fact that this was triggered by that merge and that one commit, *all* that really happened to trigger your boot failure is literally this: git log 1500e7e0726e^..024128477809 (that "1500e7e0726e^" is the pre-merge state). So it's not that the problem was introduced by one of the other list-handling changes and then 024128477809 just happened to change the timing. No, it's literally that one commit that moves code around, and that one quota_error() printout that makes the problem show for you. So it really looks like the bug is pre-existing. Or actually a compiler problem that is introduced by the added call that changes code generation, but honestly, that is a very unlikely thing. That said - while unlikely, mind just sending me the *failing* copy of the fs/quota/dquot.o object file, and I'll take a look at the code around that call. I've looked at enough code generation issues that it's worth trying.. Linus