[cc stable to see if they have any ideas about fixing this] On Sat, 2024-04-06 at 12:16 -0400, John David Anglin wrote: > On 2024-04-06 11:06 a.m., James Bottomley wrote: > > On Sat, 2024-04-06 at 10:30 -0400, John David Anglin wrote: > > > On 2024-04-05 3:36 p.m., Bart Van Assche wrote: > > > > On 4/4/24 13:07, John David Anglin wrote: > > > > > On 2024-04-04 12:32 p.m., Bart Van Assche wrote: > > > > > > Can you please help with verifying whether this kernel warn > > > > > > ing is only triggered by the 6.1 stable kernel series or > > > > > > whether it is also > > > > > > triggered by a vanilla kernel, e.g. kernel v6.8? That will > > > > > > tell us whether we > > > > > > need to review the upstream changes or the backp > > > > > > orts on the v6.1 branch. > > > > > Stable kernel v6.8.3 is okay. > > > > Would it be possible to bisect this issue on the linux-6.1.y > > > > branch? That probably will be faster than reviewing all > > > > backports > > > > of SCSI patches on that branch. > > > The warning triggers with v6.1.81. It doesn't trigger with > > > v6.1.80. > > It's this patch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=cf33e6ca12d814e1be2263cb76960d0019d7fb94 > > > > The specific problem being that the update to scsi_execute doesn't > > set the sense_len that the WARN_ON is checking. > > > > This isn't a problem in mainline because we've converted all uses > > of scsi_execute. Stable needs to either complete the conversion or > > back out the inital patch. This change depends on the above change: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9 > > Thus, more than just the initial patch needs to be backed out. OK, so the reason the bad patch got pulled in is because it's a precursor of this fixes tagged backport: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=b73dd5f9997279715cd450ee8ca599aaff2eabb9 Which is presumably the other patch you had to back out to fix the issue. The problem is that Mike's series updating and then removing scsi_execute() went into the tree as one series, so no-one notice the first patch had this bug because the buggy routine got removed at the end of the series. This also means there's nothing to fix and backport in upstream. The bug is also more widely spread than simply domain validation, because every use of scsi_execute in the current stable tree will trip this. I'm not sure what the best fix is. I can certainly come up with a one line fix for stable adding the missing length in the #define, but it can't come from upstream as stated above. We could back the two patches out then do a stable specific fix for the UAS problem (I don't think we can leave the UAS patch backed out because the problem was pretty serious). What does stable want to do? James