Here are more context and testing details: This issue was originally identified in Amazon Linux 2 with kernel 5.10 and CONFIG_HZ is 250 in x86_64 while 100 in arm64. It can be reproduced by launching EC2 instances c5.2xlarge (x86_64) and c6g.2xlarge (arm64) then measuring time to finish ext4lazyinit thread after mounting the ext4 FS. w/o fix in kernel 5.10 |----------------+-------------+------------| | ext4 FS volume | c6g.2xlarge | c5.2xlarge | |----------------+-------------+------------| | 2T | 1842 secs | 743 secs | |----------------+-------------+------------| | 3T | 2690 secs | 1110 secs | |----------------+-------------+------------| w/ fix in kernel 5.10 |----------------+-------------+------------| | ext4 FS volume | c6g.2xlarge | c5.2xlarge | |----------------+-------------+------------| | 2T | 660 secs | 544 secs | |----------------+-------------+------------| | 3T | 1053 secs | 932 secs | |----------------+-------------+------------| On Thu, Sep 02, 2021 at 04:44:11PM +0000, Shaoying Xu wrote: > Description > =========== > Ext4 FS has inappropriate implementations on the next schedule time calculation > that use jiffies to measure the time for one request to zero out inode table. This > actually makes the wait time effectively dependent on CONFIG_HZ, which is > undesirable. We have observed on server systems with 100HZ some fairly long delays > in initialization as a result. Therefore, we propose to use more granular unit to > calculate the next schedule time. > > Test > ==== > Tested the patch in stable kernel 5.10 with FS volume 2T and 3T on EC2 > instances. Before the fix, instances with 250HZ finished the lazy initialization > in around 2.4x time less than instances with 100HZ. > After the fix, both of them finished within approximately same time. > > Patch > ===== > Shaoying Xu (1): > ext4: fix lazy initialization next schedule time computation in more > granular unit > > fs/ext4/super.c | 9 ++++----- > 1 file changed, 4 insertions(+), 5 deletions(-) > > -- > 2.16.6 >