On Mon, Mar 07, 2022 at 07:16:57PM +0530, Chandan Babu R wrote: > On 07 Mar 2022 at 10:43, Dave Chinner wrote: > > On Sat, Mar 05, 2022 at 06:15:37PM +0530, Chandan Babu R wrote: > >> On 04 Mar 2022 at 13:39, Dave Chinner wrote: > >> > On Tue, Mar 01, 2022 at 04:09:36PM +0530, Chandan Babu R wrote: > >> >> @@ -102,7 +104,27 @@ xfs_bulkstat_one_int( > >> >> > >> >> buf->bs_xflags = xfs_ip2xflags(ip); > >> >> buf->bs_extsize_blks = ip->i_extsize; > >> >> - buf->bs_extents = xfs_ifork_nextents(&ip->i_df); > >> >> + > >> >> + nextents = xfs_ifork_nextents(&ip->i_df); > >> >> + if (!(bc->breq->flags & XFS_IBULK_NREXT64)) { > >> >> + xfs_extnum_t max_nextents = XFS_MAX_EXTCNT_DATA_FORK_OLD; > >> >> + > >> >> + if (unlikely(XFS_TEST_ERROR(false, mp, > >> >> + XFS_ERRTAG_REDUCE_MAX_IEXTENTS))) > >> >> + max_nextents = 10; > >> >> + > >> >> + if (nextents > max_nextents) { > >> >> + xfs_iunlock(ip, XFS_ILOCK_SHARED); > >> >> + xfs_irele(ip); > >> >> + error = -EOVERFLOW; > >> >> + goto out; > >> >> + } > >> > > >> > This just seems wrong. This will cause a total abort of the bulkstat > >> > pass which will just be completely unexpected by any application > >> > taht does not know about 64 bit extent counts. Most of them likely > >> > don't even care about the extent count in the data being returned. > >> > > >> > Really, I think this should just set the extent count to the MAX > >> > number and just continue onwards, otherwise existing application > >> > will not be able to bulkstat a filesystem with large extents counts > >> > in it at all. > >> > > >> > >> Actually, I don't know much about how applications use bulkstat. I am > >> dependent on guidance from other developers who are well versed on this > >> topic. I will change the code to return maximum extent count if the value > >> overflows older extent count limits. > > > > They tend to just run in a loop until either no more inodes are to > > be found or an error occurs. bulkstat loops don't expect errors to > > be reported - it's hard to do something based on all inodes if you > > get errors reading then inodes part way through. There's no way for > > the application to tell where it should restart scanning - the > > bulkstat iteration cookie is controlled by the kernel, and I don't > > think we update it on error. > > xfs_bulkstat() has the following, > > kmem_free(bc.buf); > > /* > * We found some inodes, so clear the error status and return them. > * The lastino pointer will point directly at the inode that triggered > * any error that occurred, so on the next call the error will be > * triggered again and propagated to userspace as there will be no > * formatted inodes in the buffer. > */ > if (breq->ocount > 0) > error = 0; > > return error; > > The above will help the userspace process to issue another bulkstat call which > beging from the inode causing an error. ANd then it returns with a cookie pointing at the overflowed inode, and we try that one first on the next loop, triggering -EOVERFLOW with breq->ocount == 0. Or maybe we have two inodes in a row that trigger EOVERFLOW, so even if we skip the first and return to userspace, we trip the second on the next call and boom... > > e.g. see fstests src/bstat.c and src/bulkstat_unlink_test*.c - they > > simply abort if bulkstat fails. Same goes for xfsdump common/util.c > > and dump/content.c - they just error out and return and don't try to > > continue further. > > I made the following changes to src/bstat.c, > > diff --git a/src/bstat.c b/src/bstat.c > index 3f3dc2c6..0e72190e 100644 > --- a/src/bstat.c > +++ b/src/bstat.c > @@ -143,7 +143,19 @@ main(int argc, char **argv) > bulkreq.ubuffer = t; > bulkreq.ocount = &count; > > - while ((ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq)) == 0) { > + while (1) { > + ret = xfsctl(name, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq); > + if (ret == -1) { > + if (errno == EOVERFLOW) { > + printf("Skipping inode %llu.\n", last+1); > + ++last; > + continue; > + } > + > + perror("xfsctl"); > + exit(1); > + } > + > total += count; > > > Executing the script at > https://gist.github.com/chandanr/f2d147fa20a681e1508e182b5b7cdb00 provides the > following output, > > ... > > ino 128 mode 040755 nlink 3 uid 0 gid 0 rdev 0 > blksize 4096 size 37 blocks 0 xflags 0 extsize 0 > atime Thu Jan 1 00:00:00.000000000 1970 > mtime Mon Mar 7 13:06:30.051339892 2022 > ctime Mon Mar 7 13:06:30.051339892 2022 > extents 0 0 gen 0 > DMI: event mask 0x00000000 state 0x0000 > > Skipping inode 131. > > ino 132 mode 040755 nlink 2 uid 0 gid 0 rdev 0 > blksize 4096 size 97 blocks 0 xflags 0 extsize 0 > atime Mon Mar 7 13:06:30.051339892 2022 > mtime Mon Mar 7 13:06:30.083339892 2022 > ctime Mon Mar 7 13:06:30.083339892 2022 > extents 0 0 gen 548703887 > DMI: event mask 0x00000000 state 0x0000 > > ... > > The above illustrates that userspace programs can be modified to use lastip to > skip inodes which cause bulkstat ioctl to return with an error. Yes, I know they can be modified to handle it - that is not the concern here. The concern is that this new error can potentially break the *unmodified* applications already out there. e.g. xfsdump may just stop dumping a filesystem half way through because it doesn't handle unexpected errors like this sanely. But we can't tie a version of xfsdump to a specific kernel feature, so we have to make sure that buklstat from older builds of xfsdump will still iterate through the entire filesystem without explicit EOVERFLOW support... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx