+ big-kernel-lock-contention-in-do_open-and-blkdev_put.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled

     Big kernel lock contention in do_open() and blkdev_put()

has been added to the -mm tree.  Its filename is

     big-kernel-lock-contention-in-do_open-and-blkdev_put.patch

See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find
out what to do about this

------------------------------------------------------
Subject: Big kernel lock contention in do_open() and blkdev_put()
From: "Chen, Kenneth W" <kenneth.w.chen@xxxxxxxxx>


Apparently the latest kernel still uses big kernel lock in the opening/
closing of block device.  And on a moderate sized numa machine, we've see
huge lock contention with lock_kernel()/unlock_kernel() function coming
from fs/block_dev.c:do_open() and block_dev.c/blkdev_put().

This was found accidentally by a slightly non-optimal application environ-
ment: a multi-process application runs on 128P numa machine; it forks out
400 processes and each process tries to open ~3000 block devices.  Because
of per process limit of file descriptor, this "smart ass" application went
into a mode where it dynamically open and close file descriptors on demand.
 I know we can get around it by increasing "open file" limit.  But it darn
on me that current code won't allow concurrent opening of different block
devices either.

I've studied do_open() and blkdev_put(), and concluded that it is already
SMP safe because they are protected by per-device bdev->bd_mutex.  What's
left that I can tell is the call to each block device type via disk->fops
->open().  I would like to propose rid the BKL in the generic block device
open/close path and put the burden on each device specific code to use BKL
if necessary.  Most of the modern block device drivers shouldn't need them
because in fops->open() they already uses one of the three variants: a)
spin lock, b) mutex, or c) per device structure.

Out of the 63 hits I see in 2.6.17-rc4 with
block_device_operations->open(), floppy driver seems to be the only one
that mucks around with a global Variable without any protection.  We can
add a spin lock there.

Signed-off-by: Ken Chen <kenneth.w.chen@xxxxxxxxx>
Christoph Hellwig <hch@xxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
---

 fs/block_dev.c |    6 ------
 1 file changed, 6 deletions(-)

diff -puN fs/block_dev.c~big-kernel-lock-contention-in-do_open-and-blkdev_put fs/block_dev.c
--- devel/fs/block_dev.c~big-kernel-lock-contention-in-do_open-and-blkdev_put	2006-05-31 17:28:26.000000000 -0700
+++ devel-akpm/fs/block_dev.c	2006-05-31 17:28:26.000000000 -0700
@@ -878,10 +878,8 @@ static int do_open(struct block_device *
 	int part;
 
 	file->f_mapping = bdev->bd_inode->i_mapping;
-	lock_kernel();
 	disk = get_gendisk(bdev->bd_dev, &part);
 	if (!disk) {
-		unlock_kernel();
 		bdput(bdev);
 		return ret;
 	}
@@ -953,7 +951,6 @@ static int do_open(struct block_device *
 	}
 	bdev->bd_openers++;
 	mutex_unlock(&bdev->bd_mutex);
-	unlock_kernel();
 	return 0;
 
 out_first:
@@ -966,7 +963,6 @@ out_first:
 	module_put(owner);
 out:
 	mutex_unlock(&bdev->bd_mutex);
-	unlock_kernel();
 	if (ret)
 		bdput(bdev);
 	return ret;
@@ -1028,7 +1024,6 @@ int blkdev_put(struct block_device *bdev
 	struct gendisk *disk = bdev->bd_disk;
 
 	mutex_lock(&bdev->bd_mutex);
-	lock_kernel();
 	if (!--bdev->bd_openers) {
 		sync_blockdev(bdev);
 		kill_bdev(bdev);
@@ -1058,7 +1053,6 @@ int blkdev_put(struct block_device *bdev
 		}
 		bdev->bd_contains = NULL;
 	}
-	unlock_kernel();
 	mutex_unlock(&bdev->bd_mutex);
 	bdput(bdev);
 	return ret;
_

Patches currently in -mm which might be from kenneth.w.chen@xxxxxxxxx are

tightening-hugetlb-strict-accounting.patch
big-kernel-lock-contention-in-do_open-and-blkdev_put.patch
sched-fix-interactive-ceiling-code.patch
sched-implement-smpnice.patch
sched-prevent-high-load-weight-tasks-suppressing-balancing.patch
sched-improve-stability-of-smpnice-load-balancing.patch
sched-improve-smpnice-load-balancing-when-load-per-task.patch
sched-modify-move_tasks-to-improve-load-balancing-outcomes.patch

-
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies FAQ]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux