On 8/22/19 1:01 AM, Dave Chinner wrote: > On Wed, Aug 21, 2019 at 05:14:51PM -0500, Eric Sandeen wrote: >> The prior effort to identify the actual root inode in a filesystem >> failed in the (rare) case where inodes were allocated with a lower >> number than the root. As a result, the wrong root inode number >> went into the dump, and restore would fail with: >> >> xfsrestore: tree.c:757: tree_begindir: Assertion `ino != persp->p_rootino || hardh == persp->p_rooth' failed. >> >> Fix this by iterating over a chunk's worth of inodes until we find >> a directory inode with generation 0, which should only be true >> for the real root inode. > > I'm not sure addresses the actual case that can cause this error. > > i.e. the root inode is always the first inode in the "root chunk" > that is allocated at mkfs time. THat location is where repair will > always calculate it to be, and it will be the inode that it uses > to rebuild the root dir from. > > The problem, as I understand it, is that we have a situation where > the per-ag btree root blocks have moved (due to split/merge) from > the their original location which is packed against the AG headers. > The root inode chunk is located at the lowest inode cluster aligned > block after these AG btree roots. > > Once the btree roots have moved, they may be space between the AG > headers and the root inode chunk to allocate a new inode chunk, > in which case we have at least 64 inodes allocated underneath the > root inode. And if we have 64k blocks and 256 byte inodes, we could > have 256 inodes per fs block between the AG headers and the root > indoe chunk. > > Hence scanning all the inodes in the first indoe chunk won't find > the root indoe if any of these situations has occurred, and we may > have to scan 1500 or more inodes in to find the root chunk (6 btree > roots - BNOBT,CNTBT,INOBT,FINOBT,RMAPBT,REFCBT - and 64k blocks). Hm. So, the one report I had of this had a root inode of 128, and bulkstat returned a first inode of 64, triggering the assert above. (but even then you're right, scanning 64 isn't enough) If it really can be that far off then maybe this is a bad way to go, although the long scan is going to be exceedingly rare I think. Most people will get the root inode on the first bulkstat call. > So I think the best thing to do here is try to calculate the root > inode number as per xfs_repair, and then bulkstat that. i.e. see > the calculation of "first_prealloc_ino" in repair/xfs_repair.c > (about line 450). Probably requires a XFS_IOC_FSGEOMETRY_V1 call to > get the necessary info to calculate it... I had started to go down that path and it seemed like a real mess. We need m_ag_maxlevels, m_rmap_maxlevels, sb_logstart, features, etc etc etc. I guess I can revisit it. I had thought about a common function to calculate this so we aren't coding it twice but I'm not sure we have the same sets of inputs for the various cases ... -Eric