On Tue, Dec 03, 2019 at 08:51:13AM +1100, Dave Chinner wrote: > On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote: > > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote: > > > Hello, > > > > > > The following reproducer results in a transaction log overrun warning > > > for me: > > > > > > mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb > > > mount -o rtdev=/dev/vdc /dev/vdb /mnt > > > fallocate -l 4G /mnt/foo > > > > > > I've attached the full dmesg output. My guess at the problem is that the > > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime > > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on > > > this filesystem, which I do see in some of the log items). However, I'm not > > > familiar enough with the XFS transaction guts to confidently fix this. Can > > > someone please help me out? > > > > Hmm... > > > > /* > > * In a write transaction we can allocate a maximum of 2 > > * extents. This gives: > > * the inode getting the new extents: inode size > > * the inode's bmap btree: max depth * block size > > * the agfs of the ags from which the extents are allocated: 2 * sector > > * the superblock free block counter: sector size > > * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size > > * And the bmap_finish transaction can free bmap blocks in a join: > > * the agfs of the ags containing the blocks: 2 * sector size > > * the agfls of the ags containing the blocks: 2 * sector size > > * the super block free block counter: sector size > > * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size > > */ > > STATIC uint > > xfs_calc_write_reservation(...); > > > > So this means that the rt allocator can burn through at most ... > > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize > > ... worth of log reservation as part of setting bits in the rtbitmap and > > fiddling with the rtsummary information. > > > > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which > > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log > > reservation. > > Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in > a single transaction. That's 2^21 filesystem blocks, which at most > is 2^21 rtexts. > > Hence I think we probably should have a separate rt-write > reservation that handles this case, and we use that for allocation > on rt devices rather than the bt-based allocation reservation. 2^21 rtexts is ... 2^18 bytes worth of rtbitmap block, which implies a transaction reservation of around ... ~300K? I guess I'll have to go play with xfs_db to see how small of a datadev you can make before that causes us to fail the minimum log size checks. As you said on IRC, it probably won't affect /most/ setups... but I don't want to run around increasing support calls either. Even if most distributors don't turn on rt support. > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > xfs: cap realtime allocation length to something we can log > > > > Omar Sandoval reported that a 4G fallocate on the realtime device causes > > filesystem shutdowns due to a log reservation overflow that happens when > > we log the rtbitmap updates. > > > > The tr_write transaction reserves enough log reservation to handle a > > full splits of both free space btrees, so cap the rt allocation at that > > number of bits. > > > > "The following reproducer results in a transaction log overrun warning > > for me: > > > > mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb > > mount -o rtdev=/dev/vdc /dev/vdb /mnt > > fallocate -l 4G /mnt/foo > > > > Reported-by: Omar Sandoval <osandov@xxxxxxxxxxx> > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_bmap_util.c | 23 +++++++++++++++++++++++ > > 1 file changed, 23 insertions(+) > > > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > > index 49d7b530c8f7..15c4e2790de3 100644 > > --- a/fs/xfs/xfs_bmap_util.c > > +++ b/fs/xfs/xfs_bmap_util.c > > @@ -69,6 +69,26 @@ xfs_zero_extent( > > } > > > > #ifdef CONFIG_XFS_RT > > +/* > > + * tr_write allows for one full split in the bnobt and cntbt to record the > > + * allocation, and that's how many bits of rtbitmap we can log to the > > + * transaction. We leave one full block's worth of log space to handle the > > + * rtsummary update, though that's probably overkill. > > + */ > > +static inline uint64_t > > +xfs_bmap_rtalloc_max( > > + struct xfs_mount *mp) > > +{ > > + uint64_t max_rtbitmap; > > + > > + max_rtbitmap = xfs_allocfree_log_count(mp, 1) - 1; > > + max_rtbitmap *= XFS_FSB_TO_B(mp, 1); > > + max_rtbitmap *= NBBY; > > + max_rtbitmap *= mp->m_sb.sb_rextsize; > > I can see how this works, but it strikes me as a bit of a hack. We > calculate the worst case reservations up front to avoid having to > play games like this in the code. Hence I think the correct thing to > do is fix the reservation to ensure we can do MAXEXTLEN allocations > without overruns... Yeah... --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx