Re: [PATCH] block: detach bdev inode from its wb in __blkdev_put()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 4, 2015 at 6:44 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> On 12/04/2015 07:07 AM, Ilya Dryomov wrote:
>>
>> On Mon, Nov 30, 2015 at 11:54 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>
>>> On Mon, Nov 30, 2015 at 8:20 AM, Raghavendra K T
>>> <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> * Ilya Dryomov <idryomov@xxxxxxxxx> [2015-11-20 22:22:34]:
>>>>
>>>>> Since 52ebea749aae ("writeback: make backing_dev_info host
>>>>> cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
>>>>> gets attached to a wb (struct bdi_writeback).  Detaching happens on
>>>>> evict, in inode_detach_wb() called from __destroy_inode(), and involves
>>>>> updating wb.
>>>>>
>>>>> However, detaching an internal bdev inode from its wb in
>>>>> __destroy_inode() is too late.  Its bdi and by extension root wb are
>>>>> embedded into struct request_queue, which has different lifetime rules
>>>>> and can be freed long before the final bdput() is called (can be from
>>>>> __fput() of a corresponding /dev inode, through dput() - evict() -
>>>>> bd_forget().  bdevs hold onto the underlying disk/queue pair only while
>>>>> opened; as soon as bdev is closed all bets are off.  In fact,
>>>>> disk/queue can be gone before __blkdev_put() even returns:
>>>>>
>>>>> 1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode,
>>>>> int for_part)
>>>>> 1500 {
>>>>> ...
>>>>> 1518         if (bdev->bd_contains == bdev) {
>>>>> 1519                 if (disk->fops->release)
>>>>> 1520                         disk->fops->release(disk, mode);
>>>>>
>>>>> [ Driver puts its references to disk/queue ]
>>>>>
>>>>> 1521         }
>>>>> 1522         if (!bdev->bd_openers) {
>>>>> 1523                 struct module *owner = disk->fops->owner;
>>>>> 1524
>>>>> 1525                 disk_put_part(bdev->bd_part);
>>>>> 1526                 bdev->bd_part = NULL;
>>>>> 1527                 bdev->bd_disk = NULL;
>>>>> 1528                 if (bdev != bdev->bd_contains)
>>>>> 1529                         victim = bdev->bd_contains;
>>>>> 1530                 bdev->bd_contains = NULL;
>>>>> 1531
>>>>> 1532                 put_disk(disk);
>>>>>
>>>>> [ We put ours, the queue is gone
>>>>>    The last bdput() would result in a write to invalid memory ]
>>>>>
>>>>> 1533                 module_put(owner);
>>>>> ...
>>>>> 1539 }
>>>>>
>>>>> Since bdev inodes are special anyway, detach them in __blkdev_put()
>>>>> after clearing inode's dirty bits, turning the problematic
>>>>> inode_detach_wb() in __destroy_inode() into a noop.
>>>>>
>>>>> add_disk() grabs its disk->queue since 523e1d399ce0 ("block: make
>>>>> gendisk hold a reference to its queue"), so the old ->release comment
>>>>> is removed in favor of the new inode_detach_wb() comment.
>>>>>
>>>>> Cc: stable@xxxxxxxxxxxxxxx # 4.2+, needs backporting
>>>>> Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
>>>>> ---
>>>>
>>>> Feel free to add
>>>> Tested-by: Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx>
>>>>
>>>> I was facing bad memory access problem while creating thousands of
>>>> containers.
>>>> With this patch I am able to create more than 10k containers without
>>>> hitting
>>>> the problem.
>>>> I had reported the problem here: https://lkml.org/lkml/2015/11/19/149
>>>
>>>
>>> Great!  Christoph's concern is with ->i_wb as a whole, not this
>>> particular patch - Al, this one is marked for stable, can we get it
>>> merged into -rc4?  Or should it go through Jens' tree, as cgroup
>>> writeback patches did?
>>
>>
>> Ping?
>
>
> Was holding off, but I think we should get the simple fix in for 4.4. I've
> applied it for this series.

I think you missed Raghavendra's Tested-by (I'm looking at
https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-linus&id=f2148d49930497c66d38b13f137c30d78c60da3d).

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux