On 11/23/2012 12:34 AM, Minchan Kim wrote: > On Thu, Nov 22, 2012 at 12:13:24PM +0100, Jerome Marchand wrote: >> On 11/21/2012 09:37 AM, Minchan Kim wrote: >>> Hi alls, >>> >>> Today, I saw below complain of lockdep. >>> As a matter of fact, I knew it long time ago but forgot that. >>> The reason lockdep complains is that now zram uses GFP_KERNEL >>> in reclaim path(ex, __zram_make_request) :( >>> I can fix it via replacing GFP_KERNEL with GFP_NOIO. >>> But more big problem is vzalloc in zram_init_device which calls GFP_KERNEL. >>> Of course, I can change it with __vmalloc which can receive gfp_t. >>> But still we have a problem. Althoug __vmalloc can handle gfp_t, it calls >>> allocation of GFP_KERNEL. That's why I sent the patch. >>> https://lkml.org/lkml/2012/4/23/77 >>> Since then, I forgot it, saw the bug today and poped the question again. >>> >>> Yes. Fundamental problem is utter crap API vmalloc. >>> If we can fix it, everyone would be happy. But life isn't simple like seeing >>> my thread of the patch. >>> >>> So next option is to move zram_init_device into setting disksize time. >>> But it makes unnecessary metadata waste until zram is used really(That's why >>> Nitin move zram_init_device from disksize setting time to make_request) and >>> it makes user should set the disksize before using, which are behavior change. >>> >>> I would like to clean up this issue before promoting because it might change >>> usage behavior. >>> >>> Do you have any idea? >> >> This is a false positive due to the memory allocation in >> zram_init_device() called from zram_make_request(). It appears to >> lockdep that the allocation might trigger a request on the device that >> would try to take init_lock again, but in fact it doesn't. The device >> is not initialized yet, even less swapped on. > > That's not a only swap case. > Let's think following usecase. > > 1) Booting > 2) echo $((DISKSIZE)) > /sys/block/zram0/disksize > 3) dd if=/dev/zero of=/dev/zram0 bs=4K count=1 > 4) Written 4K page(page-A) is still page cache and isn't submitted > to zram block device. > 5) Memory pressure happen by some memory hogger. > 6) VM start to reclaim and write page-A to zram0. > 7) zram_init_device is called at last. > 8) allocate GFP_KERNEL in zram_init_device > 9) goto reclaim path again. > 10) deadlock. > > So I think it's not false positive. I guess you're right. That's a scenario I haven't imagined. At any rate, my patch fixes that. > Even if it is, I think lock split isn't a good idea to just avoid > lockdep warn. It makes code unnecessary complicated and it would be more > error-prone. Let's not add another lock without performance trouble report > by the lock. > > As I discussed with Nitin in this thread, lazy initialization don't have > much point and disksize setting option isn't consistent for user behavior. > And I expect Nitin will send patch "diet of table" soonish. > > So just moving the initialzation part from reclaim context to process's one > is simple and clear solution, I believe. Although that would avoid deadlocks (I guess, I'm not sure anymore...), it won't stop lockdep from complaining. It still makes an allocation while holding a lock that is also taken in a reclaim context. Anyway, I like the idea to removes the lazy initialization. It makes things more complicated without any actual advantage. Jerome > >> >> The following (quickly tested) patch should prevent lockdep complain. >> >> Jerome >> >> --- >> >From ebb3514c4ee18276da7c5ca08025991b493ac204 Mon Sep 17 00:00:00 2001 >> From: Jerome Marchand <jmarchan@xxxxxxxxxx> >> Date: Thu, 22 Nov 2012 09:07:40 +0100 >> Subject: [PATCH] staging: zram: Avoid lockdep warning >> >> zram triggers a lockdep warning. The cause of it is the call to >> zram_init_device() from zram_make_request(). The memory allocation in >> zram_init_device() could start a memory reclaim which in turn could >> cause swapout and (as it appears to lockdep) a call to >> zram_make_request(). However this is a false positive: an >> unititialized device can't be used as swap. >> A solution is to split init_lock in two lock. One mutex that protects >> init, reset and size setting and a rw_semaphore that protects requests >> and reset. Thus init and request would be protected by different locks >> and lockdep will be happy. >> >> Signed-off-by: Jerome Marchand <jmarchan@xxxxxxxxxx> >> --- >> drivers/staging/zram/zram_drv.c | 41 +++++++++++++++++++----------------- >> drivers/staging/zram/zram_drv.h | 16 ++++++++++--- >> drivers/staging/zram/zram_sysfs.c | 20 +++++++++--------- >> 3 files changed, 44 insertions(+), 33 deletions(-) >> >> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c >> index fb4a7c9..b3bc3c4 100644 >> --- a/drivers/staging/zram/zram_drv.c >> +++ b/drivers/staging/zram/zram_drv.c >> @@ -470,11 +470,11 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio) >> { >> struct zram *zram = queue->queuedata; >> >> - if (unlikely(!zram->init_done) && zram_init_device(zram)) >> + if (unlikely(!is_initialized(zram)) && zram_init_device(zram)) >> goto error; >> >> - down_read(&zram->init_lock); >> - if (unlikely(!zram->init_done)) >> + down_read(&zram->req_lock); >> + if (unlikely(!is_initialized(zram))) >> goto error_unlock; >> >> if (!valid_io_request(zram, bio)) { >> @@ -483,12 +483,12 @@ static void zram_make_request(struct request_queue *queue, struct bio *bio) >> } >> >> __zram_make_request(zram, bio, bio_data_dir(bio)); >> - up_read(&zram->init_lock); >> + up_read(&zram->req_lock); >> >> return; >> >> error_unlock: >> - up_read(&zram->init_lock); >> + up_read(&zram->req_lock); >> error: >> bio_io_error(bio); >> } >> @@ -497,7 +497,7 @@ void __zram_reset_device(struct zram *zram) >> { >> size_t index; >> >> - zram->init_done = 0; >> + atomic_set(&zram->init_done, 0); >> >> /* Free various per-device buffers */ >> kfree(zram->compress_workmem); >> @@ -529,9 +529,12 @@ void __zram_reset_device(struct zram *zram) >> >> void zram_reset_device(struct zram *zram) >> { >> - down_write(&zram->init_lock); >> - __zram_reset_device(zram); >> - up_write(&zram->init_lock); >> + mutex_lock(&zram->init_lock); >> + down_write(&zram->req_lock); >> + if (is_initialized(zram)) >> + __zram_reset_device(zram); >> + up_write(&zram->req_lock); >> + mutex_unlock(&zram->init_lock); >> } >> >> int zram_init_device(struct zram *zram) >> @@ -539,10 +542,10 @@ int zram_init_device(struct zram *zram) >> int ret; >> size_t num_pages; >> >> - down_write(&zram->init_lock); >> + mutex_lock(&zram->init_lock); >> >> - if (zram->init_done) { >> - up_write(&zram->init_lock); >> + if (is_initialized(zram)) { >> + mutex_unlock(&zram->init_lock); >> return 0; >> } >> >> @@ -583,8 +586,8 @@ int zram_init_device(struct zram *zram) >> goto fail; >> } >> >> - zram->init_done = 1; >> - up_write(&zram->init_lock); >> + atomic_set(&zram->init_done, 1); >> + mutex_unlock(&zram->init_lock); >> >> pr_debug("Initialization done!\n"); >> return 0; >> @@ -594,7 +597,7 @@ fail_no_table: >> zram->disksize = 0; >> fail: >> __zram_reset_device(zram); >> - up_write(&zram->init_lock); >> + mutex_unlock(&zram->init_lock); >> pr_err("Initialization failed: err=%d\n", ret); >> return ret; >> } >> @@ -619,7 +622,8 @@ static int create_device(struct zram *zram, int device_id) >> int ret = 0; >> >> init_rwsem(&zram->lock); >> - init_rwsem(&zram->init_lock); >> + mutex_init(&zram->init_lock); >> + init_rwsem(&zram->req_lock); >> spin_lock_init(&zram->stat64_lock); >> >> zram->queue = blk_alloc_queue(GFP_KERNEL); >> @@ -672,7 +676,7 @@ static int create_device(struct zram *zram, int device_id) >> goto out; >> } >> >> - zram->init_done = 0; >> + atomic_set(&zram->init_done, 0); >> >> out: >> return ret; >> @@ -755,8 +759,7 @@ static void __exit zram_exit(void) >> zram = &zram_devices[i]; >> >> destroy_device(zram); >> - if (zram->init_done) >> - zram_reset_device(zram); >> + zram_reset_device(zram); >> } >> >> unregister_blkdev(zram_major, "zram"); >> diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h >> index df2eec4..f6bcead 100644 >> --- a/drivers/staging/zram/zram_drv.h >> +++ b/drivers/staging/zram/zram_drv.h >> @@ -96,9 +96,12 @@ struct zram { >> * against concurrent read and writes */ >> struct request_queue *queue; >> struct gendisk *disk; >> - int init_done; >> - /* Prevent concurrent execution of device init, reset and R/W request */ >> - struct rw_semaphore init_lock; >> + atomic_t init_done; >> + /* Prevent concurrent execution of device init, reset and >> + * disksize_store */ >> + struct mutex init_lock; >> + /* Prevent concurent execution device reset and R/W requests */ >> + struct rw_semaphore req_lock; >> /* >> * This is the limit on amount of *uncompressed* worth of data >> * we can store in a disk. >> @@ -108,6 +111,11 @@ struct zram { >> struct zram_stats stats; >> }; >> >> +static inline int is_initialized(struct zram *zram) >> +{ >> + return atomic_read(&zram->init_done); >> +} >> + >> extern struct zram *zram_devices; >> unsigned int zram_get_num_devices(void); >> #ifdef CONFIG_SYSFS >> @@ -115,6 +123,6 @@ extern struct attribute_group zram_disk_attr_group; >> #endif >> >> extern int zram_init_device(struct zram *zram); >> -extern void __zram_reset_device(struct zram *zram); >> +extern void zram_reset_device(struct zram *zram); >> >> #endif >> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c >> index de1eacf..b300881 100644 >> --- a/drivers/staging/zram/zram_sysfs.c >> +++ b/drivers/staging/zram/zram_sysfs.c >> @@ -62,16 +62,19 @@ static ssize_t disksize_store(struct device *dev, >> if (!disksize) >> return -EINVAL; >> >> - down_write(&zram->init_lock); >> - if (zram->init_done) { >> - up_write(&zram->init_lock); >> + mutex_lock(&zram->init_lock); >> + down_write(&zram->req_lock); >> + if (is_initialized(zram)) { >> + up_write(&zram->req_lock); >> + mutex_unlock(&zram->init_lock); >> pr_info("Cannot change disksize for initialized device\n"); >> return -EBUSY; >> } >> >> zram->disksize = PAGE_ALIGN(disksize); >> set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT); >> - up_write(&zram->init_lock); >> + up_write(&zram->req_lock); >> + mutex_unlock(&zram->init_lock); >> >> return len; >> } >> @@ -81,7 +84,7 @@ static ssize_t initstate_show(struct device *dev, >> { >> struct zram *zram = dev_to_zram(dev); >> >> - return sprintf(buf, "%u\n", zram->init_done); >> + return sprintf(buf, "%u\n", atomic_read(&zram->init_done)); >> } >> >> static ssize_t reset_store(struct device *dev, >> @@ -110,10 +113,7 @@ static ssize_t reset_store(struct device *dev, >> if (bdev) >> fsync_bdev(bdev); >> >> - down_write(&zram->init_lock); >> - if (zram->init_done) >> - __zram_reset_device(zram); >> - up_write(&zram->init_lock); >> + zram_reset_device(zram); >> >> return len; >> } >> @@ -186,7 +186,7 @@ static ssize_t mem_used_total_show(struct device *dev, >> u64 val = 0; >> struct zram *zram = dev_to_zram(dev); >> >> - if (zram->init_done) >> + if (is_initialized(zram)) >> val = zs_get_total_size_bytes(zram->mem_pool); >> >> return sprintf(buf, "%llu\n", val); >> -- >> 1.7.7.6 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>