On Tue, Jun 15, 2021 at 04:46:56PM +1000, Dave Chinner wrote: > Hi folks, > > This is the first fix for the problems Brian has reported from > generic/019. This has fixed the hang, but the other log recovery > problem he reported is still present (seen once with these patches > in place). > > I've tested these out to a couple of hundred cycles of > continual looping generic/019 before the systems fall over with a > perag reference count underrun at unmount after a shutdown. I'm > pretty sure the hang is fixed, as it would manifest within 10-20 > cycles without this patch. > > The first patch is the iclogbuf state tracing I used to capture the > iclogbuf wrapping state. The second patch is the fix. I found another bug while testing for-next. If I run generic/100 more than about ~30 times with a 1k block size: FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 flax-mtr00 5.13.0-rc4-djwx #rc4 SMP PREEMPT Mon Jun 7 11:17:23 PDT 2021 MKFS_OPTIONS -- -f -b size=1024, /dev/sdf MOUNT_OPTIONS -- -o usrquota,grpquota,prjquota, /dev/sdf /opt I see this in dmesg: run fstests generic/100 at 2021-06-15 10:41:45 XFS (sda): ctx ticket reservation ran out. Need to up reservation XFS (sda): ticket reservation summary: XFS (sda): unit res = 47168 bytes XFS (sda): current res = -404 bytes XFS (sda): original count = 1 XFS (sda): remaining count = 1 XFS (sda): xfs_do_force_shutdown(0x2) called from line 2440 of file fs/xfs/xfs_log.c. Return address = xlog_write+0x608/0x640 [xfs] XFS (sda): Log I/O Error Detected. Shutting down filesystem XFS (sda): Please unmount the filesystem and rectify the problem(s) XFS (sda): Unmounting Filesystem Looking up that line in gdb produces: 0xffffffffa038a0a8 is in xlog_write (fs/xfs/xfs_log.c:2439). 2434 int log_offset; 2435 2436 if (ticket->t_curr_res < 0) { 2437 xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES, 2438 "ctx ticket reservation ran out. Need to up reservation"); 2439 xlog_print_tic_res(log->l_mp, ticket); 2440 xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR); 2441 } I haven't applied these two patches yet, but looking back through fstests reports I never saw this before the recent for-next push. I'm uncertain if it's the CIL work or the xattr refactoring that did this, though AFAICT generic/100 itself does not generate any xattrs and I don't have any LSMs enabled that would cause them to be created. --D > > Cheers, > > Dave. > >