The patch titled Big kernel lock contention in do_open() and blkdev_put() has been added to the -mm tree. Its filename is big-kernel-lock-contention-in-do_open-and-blkdev_put.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: Big kernel lock contention in do_open() and blkdev_put() From: "Chen, Kenneth W" <kenneth.w.chen@xxxxxxxxx> Apparently the latest kernel still uses big kernel lock in the opening/ closing of block device. And on a moderate sized numa machine, we've see huge lock contention with lock_kernel()/unlock_kernel() function coming from fs/block_dev.c:do_open() and block_dev.c/blkdev_put(). This was found accidentally by a slightly non-optimal application environ- ment: a multi-process application runs on 128P numa machine; it forks out 400 processes and each process tries to open ~3000 block devices. Because of per process limit of file descriptor, this "smart ass" application went into a mode where it dynamically open and close file descriptors on demand. I know we can get around it by increasing "open file" limit. But it darn on me that current code won't allow concurrent opening of different block devices either. I've studied do_open() and blkdev_put(), and concluded that it is already SMP safe because they are protected by per-device bdev->bd_mutex. What's left that I can tell is the call to each block device type via disk->fops ->open(). I would like to propose rid the BKL in the generic block device open/close path and put the burden on each device specific code to use BKL if necessary. Most of the modern block device drivers shouldn't need them because in fops->open() they already uses one of the three variants: a) spin lock, b) mutex, or c) per device structure. Out of the 63 hits I see in 2.6.17-rc4 with block_device_operations->open(), floppy driver seems to be the only one that mucks around with a global Variable without any protection. We can add a spin lock there. Signed-off-by: Ken Chen <kenneth.w.chen@xxxxxxxxx> Christoph Hellwig <hch@xxxxxx> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- fs/block_dev.c | 6 ------ 1 file changed, 6 deletions(-) diff -puN fs/block_dev.c~big-kernel-lock-contention-in-do_open-and-blkdev_put fs/block_dev.c --- devel/fs/block_dev.c~big-kernel-lock-contention-in-do_open-and-blkdev_put 2006-05-31 17:28:26.000000000 -0700 +++ devel-akpm/fs/block_dev.c 2006-05-31 17:28:26.000000000 -0700 @@ -878,10 +878,8 @@ static int do_open(struct block_device * int part; file->f_mapping = bdev->bd_inode->i_mapping; - lock_kernel(); disk = get_gendisk(bdev->bd_dev, &part); if (!disk) { - unlock_kernel(); bdput(bdev); return ret; } @@ -953,7 +951,6 @@ static int do_open(struct block_device * } bdev->bd_openers++; mutex_unlock(&bdev->bd_mutex); - unlock_kernel(); return 0; out_first: @@ -966,7 +963,6 @@ out_first: module_put(owner); out: mutex_unlock(&bdev->bd_mutex); - unlock_kernel(); if (ret) bdput(bdev); return ret; @@ -1028,7 +1024,6 @@ int blkdev_put(struct block_device *bdev struct gendisk *disk = bdev->bd_disk; mutex_lock(&bdev->bd_mutex); - lock_kernel(); if (!--bdev->bd_openers) { sync_blockdev(bdev); kill_bdev(bdev); @@ -1058,7 +1053,6 @@ int blkdev_put(struct block_device *bdev } bdev->bd_contains = NULL; } - unlock_kernel(); mutex_unlock(&bdev->bd_mutex); bdput(bdev); return ret; _ Patches currently in -mm which might be from kenneth.w.chen@xxxxxxxxx are tightening-hugetlb-strict-accounting.patch big-kernel-lock-contention-in-do_open-and-blkdev_put.patch sched-fix-interactive-ceiling-code.patch sched-implement-smpnice.patch sched-prevent-high-load-weight-tasks-suppressing-balancing.patch sched-improve-stability-of-smpnice-load-balancing.patch sched-improve-smpnice-load-balancing-when-load-per-task.patch sched-modify-move_tasks-to-improve-load-balancing-outcomes.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html