On Wed, Nov 18, 2020 at 07:39:32AM -0800, Saranya Muruganandam wrote: > From: Wang Shilong <wshilong@xxxxxxx> > > In our benchmarking for PiB size filesystem, pass5 takes > 10446s to finish and 99.5% of time takes on reading bitmaps. > > It makes sense to reading bitmaps using multiple threads, > a quickly benchmark show 10446s to 626s with 64 threads. > > Signed-off-by: Wang Shilong <wshilong@xxxxxxx> > Signed-off-by: Saranya Muruganandam <saranyamohan@xxxxxxxxxx> Note: This patch will *explode* with much hilarity if num_threads is greater than the number of block groups. That's because the ext2fs_get_avg_group() will return 1 if fs_num_threads is greater than group_desc_count. This will result in the group start and end limits to go beyond the array boundaries.... and then *boom*. So there will probably need to be some kind of safety checks if the caller has set fs_num_threads to a value which is much larger than is appropriate for a given file system. Speaking of which, relying on fs_num_threads being set by e2fsck means that we won't get the benefits of the parallel block bitmap reads for debugfs and dumpe2fs. So we should think about how the other tools should trigger read bitmaps. And this might be something that we want to do independent of whether we are doing parallel fsck. Suggested approach: 1) Create create ext2fs_is_device_rotational() which returns whether or not a particular device is a HDD, or a non-rotational device (e.g., SSD, GCE PD, AWS EBS, etc.), using rotational as a proxy for "reading using multiple threads is a good thing". 2) Create an ext2fs_get_num_procs() which calls sysconf(_SC_NPROCESSOR_ONLN) if sysconf and _SC_NPROESSOR_ONLN is available. If not, there may be other OS-specific ways of determining the number of CPU's available. 3) If HAVE_PTHREADS and the number of block groups is greater than the number of CPU's * 2, a function that (for now) we drop in libsupport will set fs->fs_num_threads to the number of processors as the default. There may not be a reason to change the default for debugfs and dumpe2fs, but for e2fsck, this would be used for the default, but it could be over-ridden via "-E multithread=<number of threads>". - Ted