On Tue, Jul 11, 2017 at 10:08:59AM +1000, Dave Chinner wrote: > On Mon, Jul 10, 2017 at 09:18:24AM -0700, Darrick J. Wong wrote: > > On Sun, Jul 09, 2017 at 10:08:18PM +0800, Eryu Guan wrote: > > > On Fri, Jul 07, 2017 at 11:49:37PM -0700, Darrick J. Wong wrote: > > > > On Fri, Jul 07, 2017 at 08:01:43PM +0800, Eryu Guan wrote: > > > > > Hi all, > > > > > > > > > > I recently hit a repeatable sb_fdblocks corruption as below: > > > > > > > > > > Phase 1 - find and verify superblock... > > > > > Phase 2 - using internal log > > > > > - zero log... > > > > > - scan filesystem freespace and inode maps... > > > > > sb_fdblocks 14538692, counted 14669764 > > > > > - found root inode chunk > > > > > Phase 3 - for each AG... > > > > > ... > > > > > > > > > > And the count diff is always 14669764 - 14538692 = 131072 (128k). The > > > > > XFS in question was formated with "-m rmapbt=1 -b 1k" option. > > > > > > > > > > After turning on XFS_WARN and adding some debug printks (I appended the > > > > > detailed logs at the end of mail), I found that this was caused by too > > > > > large 'indlen' returned by xfs_bmap_worst_indlen(), which can't fit in a > > > > > 17 bits value (STARTBLOCKVALBITS is defined as 17), so the assert in > > > > > nullstartblock() failed: ASSERT(k < (1 << STARTBLOCKVALBITS)); > > > > > > > > > > From the log, newlen = 151513, which needs 18 bits, so nullstartblock() > > > > > throws away the 18th bit, and the sb_fdblocks difference is always 2^17 > > > > > = 131072. > > > > > > > > br_startblock is encoded in memory (and in the on-disk bmbt records) as > > > > a 52-bit unsigned integer. We signal a delayed allocation record by > > > > setting the uppermost STARTBLOCKMASKBITS (35) bits to 1 and stash the > > > > 'indlen' reservation (i.e. the worst case estimate of the space we need > > > > to grow the bmbt/rmapbt to map the entire delayed allocation) in the > > > > lower 17 bits of br_startblock. In theory this is ok because we're > > > > still quite a ways from having enough storage to create an fs where > > > > the upper bits in the agno part of an xfs_fsblock_t are actually set. > > > > > > This confirms what I read from the code, thanks! But I'm still curious > > > about how these numbers are chosen, especially STARTBLOCKMASKBITS is > > > defined as (15 + 20), where are they from? > > > > <shrug> Dave? :) > > You mean these definitions? > > #define STARTBLOCKVALBITS 17 > #define STARTBLOCKMASKBITS (15 + 20) > > Today: a history lesson. :) Glad to study :) > > Remember that these are stored encoded in a xfs_bmbt_rec on disk and > a xfs_bmbt_rec_host in memory, so they need to fit in this > definition both on disk and in memory: > > /* > * Bmap btree record and extent descriptor. > * l0:63 is an extent flag (value 1 indicates non-normal). > * l0:9-62 are startoff. > * l0:0-8 and l1:21-63 are startblock. > * l1:0-20 are blockcount. > */ > #define BMBT_EXNTFLAG_BITLEN 1 > #define BMBT_STARTOFF_BITLEN 54 > #define BMBT_STARTBLOCK_BITLEN 52 > #define BMBT_BLOCKCOUNT_BITLEN 21 > > We're looking at BMBT_STARTBLOCK_BITLEN here so it's obvious that > > that 15 + 20 + 17 is 52 bits. And that the startblock encoding for > delayed allocation obivously fits inside the space in on disk and > host extent tree records correctly. > > But what about the old "small block" format that was originally > used on 32 bit MIPS systems? That only had 32 bits in the start > block encoding *in memory*, so it should be clear that: > > 15 + 17 = 32 bits This makes a lot sense. > > Indeed, look back at the older code: > > (http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=blob;f=fs/xfs/xfs_bmap_btree.h;h=a4555abb6622a5d2ac6c08ab0f585761d6ff4585;hb=HEAD) > > #define STARTBLOCKMASKBITS (15 + XFS_BIG_BLKNOS * 20) > #define DSTARTBLOCKMASKBITS (15 + 20) > > And you can see we had different definitions for in-memory and > on-disk start block masks, and that difference was the size of the > block addresses the compiled kernel could manage. > > IOWs, the "15 + 20" is the old definition that recognised the > difference in block size between 32 bit and 64 bit systems could > originally support in memory vs what the 64-bit on-disk format > supported. We now only support 64bit in memory, so the in-memory and > on-disk definitions are now the same.... Thanks a lot for the detailed explanation! Eryu -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html