On Fri, Oct 25, 2024 at 03:19:19PM -0700, Darrick J. Wong wrote: > On Fri, Oct 25, 2024 at 05:43:41PM +1100, Dave Chinner wrote: > > On Thu, Oct 24, 2024 at 10:00:38AM -0700, Darrick J. Wong wrote: > > > On Thu, Oct 24, 2024 at 01:51:04PM +1100, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > > > Due to the failure to correctly limit sparse inode chunk allocation > > > > in runt AGs, we now have many production filesystems with sparse > > > > inode chunks allocated across the end of the runt AG. xfs_repair > > > > or a growfs is needed to fix this situation, neither of which are > > > > particularly appealing. > > > > > > > > The on disk layout from the metadump shows AG 12 as a runt that is > > > > 1031 blocks in length and the last inode chunk allocated on disk at > > > > agino 8192. > > > > > > Does this problem also happen on non-runt AGs? > > > > No. The highest agbno an inode chunk can be allocated at in a full > > size AG is aligned by rounding down from sb_agblocks. Hence > > sb_agblocks can be unaligned and nothing will go wrong. The problem > > is purely that the runt AG being shorter than sb_agblocks and so > > this highest agbno allocation guard is set beyond the end of the > > AG... > > Ah, right, and we don't want sparse inode chunks to cross EOAG because > then you'd have a chunk whose clusters would cross into the next AG, at > least in the linear LBA space. That's why (for sparse inode fses) it > makes sense that we want to round last_agino down by the chunk for > non-last AGs, and round it down by only the cluster for the last AG. > > Waitaminute, what if the last AG is less than a chunk but more than a > cluster's worth of blocks short of sb_agblocks? Or what if sb_agblocks > doesn't align with a chunk boundary? I think the new code: > > if (xfs_has_sparseinodes(mp) && agno == mp->m_sb.sb_agcount - 1) > end_align = mp->m_sb.sb_spino_align; > else > end_align = M_IGEO(mp)->cluster_align; > bno = round_down(eoag, end_align); > *last = XFS_AGB_TO_AGINO(mp, bno) - 1; > > will allow a sparse chunk that (erroneously) crosses sb_agblocks, right? > Let's say sb_spino_align == 4, sb_inoalignmt == 8, sb_agcount == 2, > sb_agblocks == 100,007, and sb_dblocks == 200,014. > > For AG 0, eoag is 100007, end_align == cluster_align == 8, so bno is > rounded down to 100000. *last is thus set to the inode at the end of > block 99999. > > For AG 1, eoag is also 100007, but now end_align == 4. bno is rounded > down to 100,004. *last is set to the inode at the end of block 100003, > not 99999. > > But now let's say we growfs another 100007 blocks onto the filesystem. > Now we have 3x AGs, each with 100007 blocks. But now *last for AG 1 > becomes 99999 even though we might've allocated an inode in block > 100000 before the growfs. That will cause a corruption error too, > right? Yes, I overlooked that case. Good catch. > IOWs, don't we want something more like this? > > /* > * The preferred inode cluster allocation size cannot ever cross > * sb_agblocks. cluster_align is one of the following: > * > * - For sparse inodes, this is an inode chunk. > * - For aligned non-sparse inodes, this is an inode cluster. > */ > bno = round_down(sb_agblocks, cluster_align); > if (xfs_has_sparseinodes(mp) && > agno == mp->m_sb.sb_agcount - 1) { > /* > * For a filesystem with sparse inodes, an inode chunk > * still cannot cross sb_agblocks, but it can cross eoag > * if eoag < agblocks. Inode clusters cannot cross eoag. > */ > last_clus_bno = round_down(eoag, sb_spino_align); > bno = min(bno, last_clus_bno); > } > *last = XFS_AGB_TO_AGINO(mp, bno) - 1; Yes, something like that is needed. > > > If the only free space > > > that could be turned into a sparse cluster is unaligned space at the > > > end of AG 0, would you still get the same corruption error? > > > > It will only happen if AG 0 is a runt AG, and then the same error > > would occur. We don't currently allow single AG filesystems, nor > > when they are set up do we create them as a runt - the are always > > full size. So current single AG filesystems made by mkfs won't have > > this problem. > > Hmm, do you have a quick means to simulate this last-AG unaligned > icluster situation? No, I haven't been able to reproduce it on demand - nothing I've tried has specifically landed a sparse inode cluster in exactly the right position to trigger this. I typically get ENOSPC when I think it should trigger and it's not immediately obvious what I'm missing in way of pre-conditions to trigger it. I've been able to test the fixes on a metadump that has the sparse chunk already on disk (which came from one of the production systems hitting this). -Dave. -- Dave Chinner david@xxxxxxxxxxxxx