On 12/04/2015 07:07 AM, Ilya Dryomov wrote:
On Mon, Nov 30, 2015 at 11:54 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Mon, Nov 30, 2015 at 8:20 AM, Raghavendra K T
<raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote:
* Ilya Dryomov <idryomov@xxxxxxxxx> [2015-11-20 22:22:34]:
Since 52ebea749aae ("writeback: make backing_dev_info host
cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
gets attached to a wb (struct bdi_writeback). Detaching happens on
evict, in inode_detach_wb() called from __destroy_inode(), and involves
updating wb.
However, detaching an internal bdev inode from its wb in
__destroy_inode() is too late. Its bdi and by extension root wb are
embedded into struct request_queue, which has different lifetime rules
and can be freed long before the final bdput() is called (can be from
__fput() of a corresponding /dev inode, through dput() - evict() -
bd_forget(). bdevs hold onto the underlying disk/queue pair only while
opened; as soon as bdev is closed all bets are off. In fact,
disk/queue can be gone before __blkdev_put() even returns:
1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
1500 {
...
1518 if (bdev->bd_contains == bdev) {
1519 if (disk->fops->release)
1520 disk->fops->release(disk, mode);
[ Driver puts its references to disk/queue ]
1521 }
1522 if (!bdev->bd_openers) {
1523 struct module *owner = disk->fops->owner;
1524
1525 disk_put_part(bdev->bd_part);
1526 bdev->bd_part = NULL;
1527 bdev->bd_disk = NULL;
1528 if (bdev != bdev->bd_contains)
1529 victim = bdev->bd_contains;
1530 bdev->bd_contains = NULL;
1531
1532 put_disk(disk);
[ We put ours, the queue is gone
The last bdput() would result in a write to invalid memory ]
1533 module_put(owner);
...
1539 }
Since bdev inodes are special anyway, detach them in __blkdev_put()
after clearing inode's dirty bits, turning the problematic
inode_detach_wb() in __destroy_inode() into a noop.
add_disk() grabs its disk->queue since 523e1d399ce0 ("block: make
gendisk hold a reference to its queue"), so the old ->release comment
is removed in favor of the new inode_detach_wb() comment.
Cc: stable@xxxxxxxxxxxxxxx # 4.2+, needs backporting
Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
---
Feel free to add
Tested-by: Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx>
I was facing bad memory access problem while creating thousands of containers.
With this patch I am able to create more than 10k containers without hitting
the problem.
I had reported the problem here: https://lkml.org/lkml/2015/11/19/149
Great! Christoph's concern is with ->i_wb as a whole, not this
particular patch - Al, this one is marked for stable, can we get it
merged into -rc4? Or should it go through Jens' tree, as cgroup
writeback patches did?
Ping?
Was holding off, but I think we should get the simple fix in for 4.4.
I've applied it for this series.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html