On Wed, Dec 04, 2019 at 08:31:17AM +1100, Dave Chinner wrote: > On Mon, Dec 02, 2019 at 06:45:26PM -0800, Darrick J. Wong wrote: > > On Tue, Dec 03, 2019 at 08:51:13AM +1100, Dave Chinner wrote: > > > On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote: > > > > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote: > > > > > Hello, > > > > > > > > > > The following reproducer results in a transaction log overrun warning > > > > > for me: > > > > > > > > > > mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb > > > > > mount -o rtdev=/dev/vdc /dev/vdb /mnt > > > > > fallocate -l 4G /mnt/foo > > > > > > > > > > I've attached the full dmesg output. My guess at the problem is that the > > > > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime > > > > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on > > > > > this filesystem, which I do see in some of the log items). However, I'm not > > > > > familiar enough with the XFS transaction guts to confidently fix this. Can > > > > > someone please help me out? > > > > > > > > Hmm... > > > > > > > > /* > > > > * In a write transaction we can allocate a maximum of 2 > > > > * extents. This gives: > > > > * the inode getting the new extents: inode size > > > > * the inode's bmap btree: max depth * block size > > > > * the agfs of the ags from which the extents are allocated: 2 * sector > > > > * the superblock free block counter: sector size > > > > * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size > > > > * And the bmap_finish transaction can free bmap blocks in a join: > > > > * the agfs of the ags containing the blocks: 2 * sector size > > > > * the agfls of the ags containing the blocks: 2 * sector size > > > > * the super block free block counter: sector size > > > > * the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size > > > > */ > > > > STATIC uint > > > > xfs_calc_write_reservation(...); > > > > > > > > So this means that the rt allocator can burn through at most ... > > > > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize > > > > ... worth of log reservation as part of setting bits in the rtbitmap and > > > > fiddling with the rtsummary information. > > > > > > > > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which > > > > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log > > > > reservation. > > > > > > Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in > > > a single transaction. That's 2^21 filesystem blocks, which at most > > > is 2^21 rtexts. > > > > > > Hence I think we probably should have a separate rt-write > > > reservation that handles this case, and we use that for allocation > > > on rt devices rather than the bt-based allocation reservation. > > > > 2^21 rtexts is ... 2^18 bytes worth of rtbitmap block, which implies a > > transaction reservation of around ... ~300K? I guess I'll have to go > > play with xfs_db to see how small of a datadev you can make before that > > causes us to fail the minimum log size checks. > > Keep in mind that rtextsz is often larger than a single filesystem > block, so the bitmap size rapidly reduces as rtextsz goes up. > > > As you said on IRC, it probably won't affect /most/ setups... but I > > don't want to run around increasing support calls either. Even if most > > distributors don't turn on rt support. > > Sure, we can limit the size of the allocation based on the > transaction reservation limits, but I suspect this will only affect > filesystems with really, really small data devices that result in a > <10MB default log size. I don't think there is that many of these > around in production.... > > I'd prefer to fix the transaction size, and then if people start > reporting that the log size is too small, we can then > limit the extent size allocation and transaction reservation based > on the (tiny) log size we read out of the superblock... Ok, I'll work on that. > Alternatively, we could implement log growing :) Heh. Wandering logs? --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx