Re: [PATCH] [RFC] ext2fs: parallel bitmap loading

Wang Shilong <wangshilong1991@xxxxxxxxx> · Thu, 17 Sep 2020 14:50:52 +0800

Hi,

On Thu, Sep 17, 2020 at 9:34 AM Andreas Dilger <adilger@xxxxxxxxx> wrote:
>
> On Sep 16, 2020, at 3:03 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> >
> > On Fri, Sep 04, 2020 at 03:34:26PM -0600, Andreas Dilger wrote:
> >> This is a patch that is part of the parallel e2fsck series that Shilong
> >> is working on, and does not work by itself, but was requested during
> >> discussion on the ext4 concall today.
> >
> > Andreas, thanks for sending this patch.  (Also available at[1].)
> >
> > [1] https://lore.kernel.org/linux-ext4/132401FE-6D25-41B3-99D1-50E7BC746237@xxxxxxxxx/
> >
> > I took look at it, and there are a number of issues with it.  First of
> > all, there seems to be an assumption that (a) the number of threads is
> > less than the number of block groups, and (b) the number of threads
> > can evenly divide the number of block groups.  So for example, if the
> > number of block groups is prime, or if you are trying to use say, 8 or
> > 16 threads, and the number of block groups is odd, the code in
> > question will not do the right thing.
>
> Yes, the thread count is checked earlier in the parallel e2fsck patch
> series to be <= number of block groups.  However, I wasn't aware of any
> requirement for groups = N * threads.  It may be coincidental that we
> have never tested that case.
>
> In any case, the patch was never really intended to be used by itself,
> only for review and discussion of the general approach.
>
> > (a) meant that attempting to run the e2fsprogs regression test suite
> > caused most of the test cases to fail with e2fsck crashing due to
> > buffer overruns.  I fixed this by changing the number of threads to be
> > 16, or if 16 was greater than the number of block groups, to be the
> > number of block groups, just for debugging purposes.  However, there
> > were still a few regression test failures.
> >
> > I also then tried to use a file system that we had been using for
> > testing fragmentation issues.  The file system was creating a 10GB
> > virtual disk, and then running these commands:
> >
> >   DEV=/dev/sdc
> >   mke2fs -t ext4 $DEV 10G
> >   mount $DEV /mnt
> >   pushd /mnt
> >   for t in $(seq 1 6144) ; do
> >       for i in $(seq 1 25) ; do
> >           fallocate tb$t-8mb-$i -l 8M
> >       done
> >       for i in $(seq 1 2) ; do
> >           fallocate tb$t-400mb-$i -l 400M
> >       done
> >   done
> >   popd
> >   umount /mnt
> >

I tested an attachment v2 patch(based on master branch) which used 32
threads locally and it passed the test.

[root@server e2fsprogs]# ./e2fsck/e2fsck -f /dev/sda4
e2fsck 1.46-WIP (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Multiple threads triggered to read bitmaps
/dev/sda4: 77963/3145728 files (0.0% non-contiguous), 12559729/12563825 blocks

> > With the patch applied, all of the threads failed with error code 22
> > (EINVAL), except for one which failed with a bad block group checksum
> > error.  I haven't had a chance to dig into further; but I was hoping
> > that Shilong and/or Saranya might be able to take closer look at that.
>
> There may very well be other issues with the patch that make it not
> useful as-is in isolation.  I'd have to let Shilong comment on that.
>
> > But the other thing that we might want to consider is to add
> > demand-loading of the block (or inode) bitmap.  We got a complaint
> > that "e2fsck -E journal_only" was super-slow whereas running the
> > journal by mounting and unmounting the file system was much faster.
> > The reason, of course, was because the kernel was only reading those
> > bitmap blocks that are needed to be modified by the orphaned inode
> > processing, whereas with e2fsprogs, we have to read in all of the
> > bitmap blocks whether this is necessary or not.
>
> Forking threads to do on-demand loading may have a high overhead, so
> it would be interesting to investigate a libext2fs IO engine that is
> using libaio.  That would allow O_DIRECT reading of filesystem metadata
> without double caching, as well as avoid blocking threads.  Alternately,
> there is already a "readahead" method exported that could be used to
> avoid changing the code too much, using posix_fadvise(WILLNEED), but I
> have no idea on how that would perform.
>
> > So another idea that we've talked about is teaching libext2fs to be
> > able to demand load the bitmap, and then when we write out the block
> > bitmap, we only need to write out those blocks that were loaded.  This
> > would also speed up running debugfs to examine the file system, as
> > well as running fuse2fs.  Fortunately, we have abstractions in front
> > of all of the bitmap accessor functions, and the code paths that would
> > need to be changed to add demand-loading of bitmaps should be mostly
> > exclusive of the changes needed for parallel bitmap loading.  So if
> > Shilong has time to look at making the parallel bitmap loader more
> > robust, perhaps Saranya could work on the demand-loading idea.
> >
> > Or if Shilong doesn't have time to try to polish this parallel bitmap
> > loading changes, we could have Saranya look at clean it up --- since
> > regardless of whether we implement demand-loading or not, parallel
> > bitmap reading is going to be useful for some use cases (e.g., a full
> > fsck, dumpe2fs, or e2image).
>
> I don't think Shilong will have time to work on major code changes for
> the next few weeks at least, due to internal deadlines, after which we
> can finish cleaning up and submitting the pfsck patch series upstream.
> If you are interested in the whole 59-patch series, it is available via:
>
> git pull https://review.whamcloud.com/tools/e2fsprogs refs/changes/14/39914/1
>
> or viewable online via Gerrit at:
>
> https://review.whamcloud.com/39914
>
> Getting some high-level review/feedback of that patch series would avoid
> spending time to rework/rebase it and finding it isn't in the form that
> you would prefer, or if it needs major architectural changes.
>
> Note that this is currently based on top of the Lustre e2fsprogs branch.
> While these shouldn't cause any problems with non-Lustre filesystems,
> there are other patches in the series that are not necessarily ready
> for submission (e.g. dirdata, Lustre xattr decoding, inode badness, etc).
>
> Cheers, Andreas
>
>
>
>
>
Attachment:
v2-0001-LU-8465-ext2fs-parallel-bitmap-loading.patch

Description: Binary data