Eric Whitney <enwlinux@xxxxxxxxx> writes: > * Ritesh Harjani <ritesh.list@xxxxxxxxx>: >> Eric Whitney <enwlinux@xxxxxxxxx> writes: >> >> > A significant number of xfstests can cause ext4 to log one or more >> > warning messages when they are run on a test file system where the >> > inline_data feature has been enabled. An example: >> > >> > "EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode >> > #16385: comm fsstress: No space for directory leaf checksum. Please >> > run e2fsck -D." >> > >> > The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068, >> > 070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585, >> > 589, 626, 631, and 650. >> >> So, I guess since these were only ext4 warnings hence maybe these were >> getting ignored? Because the tests were never failing? >> Should we do something for such cases? Maybe adding this warning >> detection in xfstests to fail the test case when these warnings are not >> intended? e.g. such warnings should make the test fail by saying >> something detected in dmesg. Except when these are expected for I/O error >> injection tests, etc... >> > > Hi, Ritesh: > > Thanks for taking a look at this patch. > > Right, the tests never failed. I was aware of the warning messages because > I routinely check the captured system log output from my upstream regression > runs. The messages weren't so much ignored as being set aside for the time > being. They have been appearing for some years, and I'd mentioned them in > past concalls. Since the warning messages simply suggest a recovery action > that's appropriate in some cases - running "e2fsck -D" - there wasn't much > interest in pursuing them, given there was no evidence of actual file system > damage or misbehavior. After becoming much more familiar with the inline_data > code myself recently I got suspicious and took a closer look. > > I don't know that I've got a strong opinion about this, but I think that adding > the EXT4-fs warning and error message prefixes to the set of strings searched > for by _check_dmesg, say, to force a test failure might be more trouble than > it's worth (at least, in comparison with periodically grepping through the > logs). Adding ext4-specific filters to individual xfstests as needed, > including maintaining them over time and extending the coverage to new tests as > they appear, sounds like a lot of ongoing work for what might be a modest ok, sure. But let me keep an eye out for this... Let me watch out for any such bugs in my internal tests run to see whether adding such a check can help us catch any hidden problems. I was thinking this need not be done in one shot but can be done incrementally/individually for many tests. Hence it should be relatively easy if we do that on the need basis maybe. I am not sure though of the returns/benefits from this work at this point in time, until I have reviewed the list of failures. > return. IIRC, we haven't had a significant number of bugs associated with > EXT4-fs messages without test failures in the last several years, at least. ok. Let me also take a look at it. Thanks! > >> > >> > In this situation, the warning message indicates a bug in the code that >> > performs the RENAME_WHITEOUT operation on a directory entry that has >> > been stored inline. It doesn't detect that the directory is stored >> > inline, and incorrectly attempts to compute a dirent block checksum on >> > the whiteout inode when creating it. This attempt fails as a result >> > of the integrity checking in get_dirent_tail (usually due to a failure >> > to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message >> > is then emitted. >> > >> > Fix this by simply collecting the inlined data state at the time the >> > search for the source directory entry is performed. Existing code >> > handles the rest, and this is sufficient to eliminate all spurious >> > warning messages produced by the tests above. Go one step further >> > and do the same in the code that resets the source directory entry in >> > the event of failure. The inlined state should be present in the >> > "old" struct, but given the possibility of a race there's no harm >> > in taking a conservative approach and getting that information again >> > since the directory entry is being reread anyway. >> >> Thanks for the detailed explaination. This makes sense to me. >> >> > >> > Fixes: b7ff91fd030d ("ext4: find old entry again if failed to rename whiteout") >> >> So for your changes in ext4_resetent(), your above fixes tags make sense. >> But what about the changes in ext4_rename() function. That was always >> passing NULL as the last argument since the begining no? >> Thinking from the backport perspective if and when required ;) >> > > I'm guessing the intersection of the set of inline data and whiteout (overlayfs) > users is sufficiently small that this patch won't need backporting anytime > soon. :-) > > The reason I picked that tag is that it's a fix for a fix to the patch that > originally added whiteout support to ext4. I wanted to convey that those > fixes should be applied in addition to this patch to get fully functional code. Sure. Thanks for the explaination. -ritesh