On Thu, Mar 10, 2022 at 05:55:25PM +1100, Dave Chinner wrote: > On Tue, Mar 08, 2022 at 07:53:51PM -0800, Darrick J. Wong wrote: > > On Tue, Mar 01, 2022 at 01:39:36PM -0700, Allison Henderson wrote: > > > > > > > > > On 2/28/22 7:29 PM, Darrick J. Wong wrote: > > > > On Mon, Feb 28, 2022 at 12:51:32PM -0700, Allison Henderson wrote: > > > > > Hi all, > > > > > > > > > > This set is a subset of a larger series parent pointers. Delayed attributes allow > > > > > attribute operations (set and remove) to be logged and committed in the same > > > > > way that other delayed operations do. This allows more complex operations (like > > > > > parent pointers) to be broken up into multiple smaller transactions. To do > > > > > this, the existing attr operations must be modified to operate as a delayed > > > > > operation. This means that they cannot roll, commit, or finish transactions. > > > > > Instead, they return -EAGAIN to allow the calling function to handle the > > > > > transaction. In this series, we focus on only the delayed attribute portion. > > > > > We will introduce parent pointers in a later set. > > > > > > > > > > The set as a whole is a bit much to digest at once, so I usually send out the > > > > > smaller sub series to reduce reviewer burn out. But the entire extended series > > > > > is visible through the included github links. > > > > > > > > > > Updates since v27: > > > > > xfs: don't commit the first deferred transaction without intents > > > > > Comment update > > > > > > > > I applied this to 5.16-rc6, and turned on larp mode. generic/476 > > > > tripped over something, and this is what kasan had to say: > > > > > > > > [ 835.381655] run fstests generic/476 at 2022-02-28 18:22:04 > > > > [ 838.008485] XFS (sdb): Mounting V5 Filesystem > > > > [ 838.035529] XFS (sdb): Ending clean mount > > > > [ 838.040528] XFS (sdb): Quotacheck needed: Please wait. > > > > [ 838.050866] XFS (sdb): Quotacheck: Done. > > > > [ 838.092369] XFS (sdb): EXPERIMENTAL logged extended attributes feature added. Use at your own risk! > > > > [ 838.092938] general protection fault, probably for non-canonical address 0xe012f573e6000046: 0000 [#1] PREEMPT SMP KASAN > > > > [ 838.099085] KASAN: maybe wild-memory-access in range [0x0097cb9f30000230-0x0097cb9f30000237] > > > > [ 838.101148] CPU: 2 PID: 4403 Comm: fsstress Not tainted 5.17.0-rc5-djwx #rc5 63f7e400b85b2245f2d4d3033e82ec8bc95c49fd > > > > [ 838.103757] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014 > > > > [ 838.105811] RIP: 0010:xlog_cil_commit+0x2f9/0x2800 [xfs] > > > > > > > > > > > > FWIW, gdb says this address is: > > > > > > > > 0xffffffffa06e0739 is in xlog_cil_commit (fs/xfs/xfs_log_cil.c:237). > > > > 232 > > > > 233 /* > > > > 234 * if we have no shadow buffer, or it is too small, we need to > > > > 235 * reallocate it. > > > > 236 */ > > > > 237 if (!lip->li_lv_shadow || > > > > 238 buf_size > lip->li_lv_shadow->lv_size) { > > > > 239 /* > > > > 240 * We free and allocate here as a realloc would copy > > > > 241 * unnecessary data. We don't use kvzalloc() for the > > > > > > > > I don't know what this is about, but my guess is that we freed something > > > > we weren't supposed to...? > > > > > > > > (An overnight fstests run with v27 and larp=0 ran fine, though...) > > > > > > > > --D > > > > > > Hmm, ok, I will dig into this then. I dont see anything between v27 and v28 > > > that would have cause this though, so I'm thinking what ever it is must by > > > intermittent. I'll stick it in a loop and see if I can get a recreate > > > today. Thanks! > > > > I think I've figured out two of the problems here -- > > > > The biggest problem is that xfs_attri_init isn't fully initializing the > > xattr log item structure, which is why the CIL would crash on my system > > when it tried to resize what it thought was the lv_shadow buffer > > attached to the log item. I changed it to kmem_cache_zalloc and the > > problems went away; you might want to check if your kernel has some > > debugging kconfig feature enabled that auto-zeroes everything. > > > > The other KASAN report has to do with the log iovec code -- it assumes > > that any buffer passed in has a size that is congruent with 4(?) bytes. > > This isn't necessarily true for the xattr name (and in principle also > > the value) buffer that we get from the VFS; if either is (say) 37 bytes > > long, you'll get 37 bytes, and KASAN will expect you to stick to that. > > I think with the way the slab works this isn't a real memory corruption > > vector, but I wouldn't put it past slob or someone to actually pack > > things in tightly. > > This seems ... familiar. ISTR I fixed this issue and send out > patches to address it some time ago. > > Ah, yes, I did - for v24 of the LARP patchset. > > https://lore.kernel.org/linux-xfs/20210901073039.844617-1-david@xxxxxxxxxxxxx/ > > Patch 3 fixed the log iovec rounding constraint: > > https://lore.kernel.org/linux-xfs/20210901073039.844617-1-david@xxxxxxxxxxxxx/ > > and patch 4 fixed the iovec sizing problems for attribute intents: > > https://lore.kernel.org/linux-xfs/20210901073039.844617-1-david@xxxxxxxxxxxxx/ And in starting a forward port of the intent whiteout series that contains these fixes, I realised that these atches are also dependent on the xlog-write-rework series I just reposted a day ago. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx