Hi Ted! On Thu 08-12-22 00:55:55, Theodore Ts'o wrote: > One thing which is completely unclear to me is how this relates to the > claimed regression. I understand that Jeremi and Thilo have asserted > that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix > warning in ext4_iomap_begin as race between bmap and write") is not in > their 5.15 product tree. IMHO the bisection was flawed and somehow the test of a revert (which guys claimed to have done) must have been lucky and didn't trip the race. This is not that hard to imagine because firstly, the commit 51ae846cff5 got included in the same stable kernel release as ext4 xattr changes (65f8b80053 ("ext4: fix race when reusing xattr blocks") and related mbcache changes) which likely made the race more likely. Secondly, the mbcache state corruption is not that easy to hit because you need an interaction of slab reclaim on mbcache entry with ext4 xattr code adding reference to xattr block and just hitting the reference limit. > However, the stack traces point to a problem in the extended attribute > code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5 > only changes the ext4's bmap function --- which these days gets used > for the FIBMAP ioctl and very little else. > > Furthermore, the fix which Jan provided, and which apparently fixes > the user's problem, (a) doesn't touch the ext4_bmap function, and (b) > has a fixes tag for the patch: > > Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") > > ... which is a commit which dates back to 2016, and the v4.6 kernel. ?!? Yes. AFAICT the bitfield race in mbcache was introduced in this commit but somehow ext4 was using mbcache in a way that wasn't tripping the race. After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race became much more likely and users started to notice... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR