Re: [PATCH] xfs: don't reuse busy extents on extent trim

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 26 May 2022 23:56:11 +0300

> > I tested it on top of 5.10.109 + these 5 patches:
> > https://github.com/amir73il/linux/commits/xfs-5.10.y-1
> >
> > I can test it in isolation if you like. Let me know if there are
> > other forensics that you would like me to collect.
> >
>
> Hm. Still no luck if I move to .109 and pull in those few patches. I
> assume there's nothing else potentially interesting about the test env
> other than the sparse file scratch dev (i.e., default mkfs options,

Oh! right, this guest is debian/10 with xfsprogs 4.20, so the defaults
are reflink=0.

Actually, the section I am running is reflink_normapbt, but...

** mkfs failed with extra mkfs options added to "-f -m
reflink=1,rmapbt=0, -i sparse=1," by test 076 **
** attempting to mkfs using only test 076 options: -m crc=1 -i sparse **
** mkfs failed with extra mkfs options added to "-f -m
reflink=1,rmapbt=0, -i sparse=1," by test 076 **
** attempting to mkfs using only test 076 options: -d size=50m -m
crc=1 -i sparse **

mkfs.xfs does not accept double sparse argument, so the
test falls back to mkfs defaults (+ sparse)

I checked and xfsprogs 5.3 behaves the same, I did not check newer
xfsprogs, but that seems like a test bug(?)

IWO, unless xfsprogs was changed to be more tolerable to repeating
arguments, then maybe nobody is testing xfs/076 with reflink=0 (?)

> etc.)? If so and you can reliably reproduce, I suppose it couldn't hurt
> to try and grab a tracepoint dump of the test when it fails (feel free
> to send directly or upload somewhere as the list may punt it, and please
> also include the dmesg output that goes along with it) and I can see if
> that shows anything helpful.
>
> I think what we want to know initially is what error code we're
> producing (-ENOSPC?) and where it originates, and from there we can
> probably work out how the transaction might be dirty. I'm not sure a
> trace dump will express that conclusively. If you wanted to increase the
> odds of getting some useful information it might be helpful to stick a
> few trace_printk() calls in the various trans cancel error paths out of
> xfs_create() to determine whether it's the inode allocation attempt that
> fails or the subsequent attempt to create the directory entry..
>

Well, the full output is filled with ENOSPC (also in a good run), so it's
probably that, but I will try to get to that failing stack, no need for all the
noisy traces. Signing off the day. hope I will get to it tomorrow.

Thanks,
Amir.

P.S: this is how 076.full ends if it makes any difference:

touch: cannot touch '/media/scratch/offset.21889024/63': No space left on device
touch: cannot touch '/media/scratch/offset.21823488/63': No space left on device
touch: cannot touch '/media/scratch/offset.21757952/63': No space left on device
touch: cannot touch '/media/scratch/offset.21692416/63': No space left on device
touch: cannot touch '/media/scratch/offset.21626880/63': No space left on device
touch: cannot touch '/media/scratch/offset.21561344/63': No space left on device
touch: cannot touch '/media/scratch/offset.21495808/63': No space left on device
touch: cannot touch '/media/scratch/offset.21430272/63': No space left on device
stat: Input/output error
fpunch failed