Hi, On Thu, Sep 17, 2020 at 9:34 AM Andreas Dilger <adilger@xxxxxxxxx> wrote: > > On Sep 16, 2020, at 3:03 PM, Theodore Y. Ts'o <tytso@xxxxxxx> wrote: > > > > On Fri, Sep 04, 2020 at 03:34:26PM -0600, Andreas Dilger wrote: > >> This is a patch that is part of the parallel e2fsck series that Shilong > >> is working on, and does not work by itself, but was requested during > >> discussion on the ext4 concall today. > > > > Andreas, thanks for sending this patch. (Also available at[1].) > > > > [1] https://lore.kernel.org/linux-ext4/132401FE-6D25-41B3-99D1-50E7BC746237@xxxxxxxxx/ > > > > I took look at it, and there are a number of issues with it. First of > > all, there seems to be an assumption that (a) the number of threads is > > less than the number of block groups, and (b) the number of threads > > can evenly divide the number of block groups. So for example, if the > > number of block groups is prime, or if you are trying to use say, 8 or > > 16 threads, and the number of block groups is odd, the code in > > question will not do the right thing. > > Yes, the thread count is checked earlier in the parallel e2fsck patch > series to be <= number of block groups. However, I wasn't aware of any > requirement for groups = N * threads. It may be coincidental that we > have never tested that case. > > In any case, the patch was never really intended to be used by itself, > only for review and discussion of the general approach. > > > (a) meant that attempting to run the e2fsprogs regression test suite > > caused most of the test cases to fail with e2fsck crashing due to > > buffer overruns. I fixed this by changing the number of threads to be > > 16, or if 16 was greater than the number of block groups, to be the > > number of block groups, just for debugging purposes. However, there > > were still a few regression test failures. > > > > I also then tried to use a file system that we had been using for > > testing fragmentation issues. The file system was creating a 10GB > > virtual disk, and then running these commands: > > > > DEV=/dev/sdc > > mke2fs -t ext4 $DEV 10G > > mount $DEV /mnt > > pushd /mnt > > for t in $(seq 1 6144) ; do > > for i in $(seq 1 25) ; do > > fallocate tb$t-8mb-$i -l 8M > > done > > for i in $(seq 1 2) ; do > > fallocate tb$t-400mb-$i -l 400M > > done > > done > > popd > > umount /mnt > > I tested an attachment v2 patch(based on master branch) which used 32 threads locally and it passed the test. [root@server e2fsprogs]# ./e2fsck/e2fsck -f /dev/sda4 e2fsck 1.46-WIP (20-Mar-2020) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Multiple threads triggered to read bitmaps /dev/sda4: 77963/3145728 files (0.0% non-contiguous), 12559729/12563825 blocks > > With the patch applied, all of the threads failed with error code 22 > > (EINVAL), except for one which failed with a bad block group checksum > > error. I haven't had a chance to dig into further; but I was hoping > > that Shilong and/or Saranya might be able to take closer look at that. > > There may very well be other issues with the patch that make it not > useful as-is in isolation. I'd have to let Shilong comment on that. > > > But the other thing that we might want to consider is to add > > demand-loading of the block (or inode) bitmap. We got a complaint > > that "e2fsck -E journal_only" was super-slow whereas running the > > journal by mounting and unmounting the file system was much faster. > > The reason, of course, was because the kernel was only reading those > > bitmap blocks that are needed to be modified by the orphaned inode > > processing, whereas with e2fsprogs, we have to read in all of the > > bitmap blocks whether this is necessary or not. > > Forking threads to do on-demand loading may have a high overhead, so > it would be interesting to investigate a libext2fs IO engine that is > using libaio. That would allow O_DIRECT reading of filesystem metadata > without double caching, as well as avoid blocking threads. Alternately, > there is already a "readahead" method exported that could be used to > avoid changing the code too much, using posix_fadvise(WILLNEED), but I > have no idea on how that would perform. > > > So another idea that we've talked about is teaching libext2fs to be > > able to demand load the bitmap, and then when we write out the block > > bitmap, we only need to write out those blocks that were loaded. This > > would also speed up running debugfs to examine the file system, as > > well as running fuse2fs. Fortunately, we have abstractions in front > > of all of the bitmap accessor functions, and the code paths that would > > need to be changed to add demand-loading of bitmaps should be mostly > > exclusive of the changes needed for parallel bitmap loading. So if > > Shilong has time to look at making the parallel bitmap loader more > > robust, perhaps Saranya could work on the demand-loading idea. > > > > Or if Shilong doesn't have time to try to polish this parallel bitmap > > loading changes, we could have Saranya look at clean it up --- since > > regardless of whether we implement demand-loading or not, parallel > > bitmap reading is going to be useful for some use cases (e.g., a full > > fsck, dumpe2fs, or e2image). > > I don't think Shilong will have time to work on major code changes for > the next few weeks at least, due to internal deadlines, after which we > can finish cleaning up and submitting the pfsck patch series upstream. > If you are interested in the whole 59-patch series, it is available via: > > git pull https://review.whamcloud.com/tools/e2fsprogs refs/changes/14/39914/1 > > or viewable online via Gerrit at: > > https://review.whamcloud.com/39914 > > Getting some high-level review/feedback of that patch series would avoid > spending time to rework/rebase it and finding it isn't in the form that > you would prefer, or if it needs major architectural changes. > > Note that this is currently based on top of the Lustre e2fsprogs branch. > While these shouldn't cause any problems with non-Lustre filesystems, > there are other patches in the series that are not necessarily ready > for submission (e.g. dirdata, Lustre xattr decoding, inode badness, etc). > > Cheers, Andreas > > > > >
Attachment:
v2-0001-LU-8465-ext2fs-parallel-bitmap-loading.patch
Description: Binary data