On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote: > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote: > > Hello, > > > > The following reproducer results in a transaction log overrun warning > > for me: > > > > mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb > > mount -o rtdev=/dev/vdc /dev/vdb /mnt > > fallocate -l 4G /mnt/foo > > > > I've attached the full dmesg output. My guess at the problem is that the > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on > > this filesystem, which I do see in some of the log items). However, I'm not > > familiar enough with the XFS transaction guts to confidently fix this. Can > > someone please help me out? > > Hmm... > > /* > * In a write transaction we can allocate a maximum of 2 > * extents. This gives: > * the inode getting the new extents: inode size > * the inode's bmap btree: max depth * block size > * the agfs of the ags from which the extents are allocated: 2 * sector > * the superblock free block counter: sector size > * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size > * And the bmap_finish transaction can free bmap blocks in a join: > * the agfs of the ags containing the blocks: 2 * sector size > * the agfls of the ags containing the blocks: 2 * sector size > * the super block free block counter: sector size > * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size > */ > STATIC uint > xfs_calc_write_reservation(...); > > So this means that the rt allocator can burn through at most ... > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize > ... worth of log reservation as part of setting bits in the rtbitmap and > fiddling with the rtsummary information. > > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log > reservation. Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in a single transaction. That's 2^21 filesystem blocks, which at most is 2^21 rtexts. Hence I think we probably should have a separate rt-write reservation that handles this case, and we use that for allocation on rt devices rather than the bt-based allocation reservation. > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > xfs: cap realtime allocation length to something we can log > > Omar Sandoval reported that a 4G fallocate on the realtime device causes > filesystem shutdowns due to a log reservation overflow that happens when > we log the rtbitmap updates. > > The tr_write transaction reserves enough log reservation to handle a > full splits of both free space btrees, so cap the rt allocation at that > number of bits. > > "The following reproducer results in a transaction log overrun warning > for me: > > mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb > mount -o rtdev=/dev/vdc /dev/vdb /mnt > fallocate -l 4G /mnt/foo > > Reported-by: Omar Sandoval <osandov@xxxxxxxxxxx> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > fs/xfs/xfs_bmap_util.c | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 49d7b530c8f7..15c4e2790de3 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -69,6 +69,26 @@ xfs_zero_extent( > } > > #ifdef CONFIG_XFS_RT > +/* > + * tr_write allows for one full split in the bnobt and cntbt to record the > + * allocation, and that's how many bits of rtbitmap we can log to the > + * transaction. We leave one full block's worth of log space to handle the > + * rtsummary update, though that's probably overkill. > + */ > +static inline uint64_t > +xfs_bmap_rtalloc_max( > + struct xfs_mount *mp) > +{ > + uint64_t max_rtbitmap; > + > + max_rtbitmap = xfs_allocfree_log_count(mp, 1) - 1; > + max_rtbitmap *= XFS_FSB_TO_B(mp, 1); > + max_rtbitmap *= NBBY; > + max_rtbitmap *= mp->m_sb.sb_rextsize; I can see how this works, but it strikes me as a bit of a hack. We calculate the worst case reservations up front to avoid having to play games like this in the code. Hence I think the correct thing to do is fix the reservation to ensure we can do MAXEXTLEN allocations without overruns... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx