Re: Transaction log reservation overrun when fallocating realtime file

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Wed, 4 Dec 2019 08:31:36 -0800

On Wed, Dec 04, 2019 at 08:31:17AM +1100, Dave Chinner wrote:
> On Mon, Dec 02, 2019 at 06:45:26PM -0800, Darrick J. Wong wrote:
> > On Tue, Dec 03, 2019 at 08:51:13AM +1100, Dave Chinner wrote:
> > > On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote:
> > > > > Hello,
> > > > > 
> > > > > The following reproducer results in a transaction log overrun warning
> > > > > for me:
> > > > > 
> > > > >   mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
> > > > >   mount -o rtdev=/dev/vdc /dev/vdb /mnt
> > > > >   fallocate -l 4G /mnt/foo
> > > > > 
> > > > > I've attached the full dmesg output. My guess at the problem is that the
> > > > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime
> > > > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on
> > > > > this filesystem, which I do see in some of the log items). However, I'm not
> > > > > familiar enough with the XFS transaction guts to confidently fix this. Can
> > > > > someone please help me out?
> > > > 
> > > > Hmm...
> > > > 
> > > > /*
> > > >  * In a write transaction we can allocate a maximum of 2
> > > >  * extents.  This gives:
> > > >  *    the inode getting the new extents: inode size
> > > >  *    the inode's bmap btree: max depth * block size
> > > >  *    the agfs of the ags from which the extents are allocated: 2 * sector
> > > >  *    the superblock free block counter: sector size
> > > >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> > > >  * And the bmap_finish transaction can free bmap blocks in a join:
> > > >  *    the agfs of the ags containing the blocks: 2 * sector size
> > > >  *    the agfls of the ags containing the blocks: 2 * sector size
> > > >  *    the super block free block counter: sector size
> > > >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> > > >  */
> > > > STATIC uint
> > > > xfs_calc_write_reservation(...);
> > > > 
> > > > So this means that the rt allocator can burn through at most ...
> > > > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize
> > > > ... worth of log reservation as part of setting bits in the rtbitmap and
> > > > fiddling with the rtsummary information.
> > > > 
> > > > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which
> > > > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log
> > > > reservation.
> > > 
> > > Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in
> > > a single transaction. That's 2^21 filesystem blocks, which at most
> > > is 2^21 rtexts.
> > > 
> > > Hence I think we probably should have a separate rt-write
> > > reservation that handles this case, and we use that for allocation
> > > on rt devices rather than the bt-based allocation reservation.
> > 
> > 2^21 rtexts is ... 2^18 bytes worth of rtbitmap block, which implies a
> > transaction reservation of around ... ~300K?  I guess I'll have to go
> > play with xfs_db to see how small of a datadev you can make before that
> > causes us to fail the minimum log size checks.
> 
> Keep in mind that rtextsz is often larger than a single filesystem
> block, so the bitmap size rapidly reduces as rtextsz goes up.
> 
> > As you said on IRC, it probably won't affect /most/ setups... but I
> > don't want to run around increasing support calls either.  Even if most
> > distributors don't turn on rt support.
> 
> Sure, we can limit the size of the allocation based on the
> transaction reservation limits, but I suspect this will only affect
> filesystems with really, really small data devices that result in a
> <10MB default log size. I don't think there is that many of these
> around in production....
> 
> I'd prefer to fix the transaction size, and then if people start
> reporting that the log size is too small, we can then
> limit the extent size allocation and transaction reservation based
> on the (tiny) log size we read out of the superblock...

Ok, I'll work on that.

> Alternatively, we could implement log growing :)

Heh.  Wandering logs?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx