Re: [PATCH] ext4: improve ext4lazyinit scalability V2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dmitry, thanks for the patch, and Jan, thanks for the review.

This is what I've added to the ext4 tree, which should reflect all of
the comments which Jan made.

				- Ted

commit 933ef679501f124bc5eb762f6068d099a0579937
Author: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
Date:   Mon Sep 5 23:38:36 2016 -0400

    ext4: improve ext4lazyinit scalability
    
    ext4lazyinit is a global thread. This thread performs itable
    initalization under li_list_mtx mutex.
    
    It basically does the following:
    ext4_lazyinit_thread
      ->mutex_lock(&eli->li_list_mtx);
      ->ext4_run_li_request(elr)
        ->ext4_init_inode_table-> Do a lot of IO if the list is large
    
    And when new mount/umount arrive they have to block on ->li_list_mtx
    because  lazy_thread holds it during full walk procedure.
    ext4_fill_super
     ->ext4_register_li_request
       ->mutex_lock(&ext4_li_info->li_list_mtx);
       ->list_add(&elr->lr_request, &ext4_li_info >li_request_list);
    In my case mount takes 40minutes on server with 36 * 4Tb HDD.
    Common user may face this in case of very slow dev ( /dev/mmcblkXXX)
    Even more. If one of filesystems was frozen lazyinit_thread will simply
    block on sb_start_write() so other mount/umount will be stuck forever.
    
    This patch changes logic like follows:
    - grab ->s_umount read sem before processing new li_request.
      After that it is safe to drop li_list_mtx because all callers of
      li_remove_request are holding ->s_umount for write.
    - li_thread skips frozen SB's
    
    Locking order:
    Mh KOrder is asserted by umount path like follows: s_umount ->li_list_mtx so
    the only way to to grab ->s_mount inside li_thread is via down_read_trylock
    
    xfstests:ext4/023
    #PSBM-49658
    
    Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
    Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 5819b0e..ebeebf5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2749,7 +2749,6 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 	sb = elr->lr_super;
 	ngroups = EXT4_SB(sb)->s_groups_count;
 
-	sb_start_write(sb);
 	for (group = elr->lr_next_group; group < ngroups; group++) {
 		gdp = ext4_get_group_desc(sb, group, NULL);
 		if (!gdp) {
@@ -2776,8 +2775,6 @@ static int ext4_run_li_request(struct ext4_li_request *elr)
 		elr->lr_next_sched = jiffies + elr->lr_timeout;
 		elr->lr_next_group = group + 1;
 	}
-	sb_end_write(sb);
-
 	return ret;
 }
 
@@ -2842,21 +2839,46 @@ cont_thread:
 			mutex_unlock(&eli->li_list_mtx);
 			goto exit_thread;
 		}
-
 		list_for_each_safe(pos, n, &eli->li_request_list) {
+			int err = 0;
+			int progress = 0;
 			elr = list_entry(pos, struct ext4_li_request,
 					 lr_request);
 
-			if (time_after_eq(jiffies, elr->lr_next_sched)) {
-				if (ext4_run_li_request(elr) != 0) {
-					/* error, remove the lazy_init job */
-					ext4_remove_li_request(elr);
-					continue;
+			if (time_before(jiffies, elr->lr_next_sched)) {
+				if (time_before(elr->lr_next_sched, next_wakeup))
+					next_wakeup = elr->lr_next_sched;
+				continue;
+			}
+			if (down_read_trylock(&elr->lr_super->s_umount)) {
+				if (sb_start_write_trylock(elr->lr_super)) {
+					progress = 1;
+					/*
+					 * We hold sb->s_umount, sb can not
+					 * be removed from the list, it is
+					 * now safe to drop li_list_mtx
+					 */
+					mutex_unlock(&eli->li_list_mtx);
+					err = ext4_run_li_request(elr);
+					sb_end_write(elr->lr_super);
+					mutex_lock(&eli->li_list_mtx);
+					n = pos->next;
 				}
+				up_read((&elr->lr_super->s_umount));
+			}
+			/* error, remove the lazy_init job */
+			if (err) {
+				ext4_remove_li_request(elr);
+				continue;
+			}
+			if (!progress) {
+				elr->lr_next_sched = jiffies +
+					(prandom_u32()
+					 % (EXT4_DEF_LI_MAX_START_DELAY * HZ));
+				if (time_before(elr->lr_next_sched,
+						next_wakeup))
+					next_wakeup = elr->lr_next_sched;
 			}
-
-			if (time_before(elr->lr_next_sched, next_wakeup))
-				next_wakeup = elr->lr_next_sched;
 		}
 		mutex_unlock(&eli->li_list_mtx);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux