Re: [PATCH 1/2] nvme-tcp: use sendpage_ok() to check page for kernel_sendpage()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020/7/28 01:25, Sagi Grimberg wrote:
> 
> 
> On 7/26/20 6:52 AM, Coly Li wrote:
>> Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
>> send slab pages. But for pages allocated by __get_free_pages() without
>> __GFP_COMP, which also have refcount as 0, they are still sent by
>> kernel_sendpage() to remote end, this is problematic.
>>
>> When bcache uses a remote NVMe SSD via nvme-over-tcp as its cache
>> device, writing meta data e.g. cache_set->disk_buckets to remote SSD may
>> trigger a kernel panic due to the above problem. Bcause the meta data
>> pages for cache_set->disk_buckets are allocated by __get_free_pages()
>> without __GFP_COMP.
>>
>> This problem should be fixed both in upper layer driver (bcache) and
>> nvme-over-tcp code. This patch fixes the nvme-over-tcp code by checking
>> whether the page refcount is 0, if yes then don't use kernel_sendpage()
>> and call sock_no_sendpage() to send the page into network stack.
>>
>> Such check is done by macro sendpage_ok() in this patch, which is defined
>> in include/linux/net.h as,
>>     (!PageSlab(page) && page_count(page) >= 1)
>> If sendpage_ok() returns false, sock_no_sendpage() will handle the page
>> other than kernel_sendpage().
>>
>> The code comments in this patch is copied and modified from drbd where
>> the similar problem already gets solved by Philipp Reisner. This is the
>> best code comment including my own version.
>>
>> Signed-off-by: Coly Li <colyli@xxxxxxx>
>> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@xxxxxxx>
>> Cc: Christoph Hellwig <hch@xxxxxx>
>> Cc: Hannes Reinecke <hare@xxxxxxx>
>> Cc: Jan Kara <jack@xxxxxxxx>
>> Cc: Jens Axboe <axboe@xxxxxxxxx>
>> Cc: Mikhail Skorzhinskii <mskorzhinskiy@xxxxxxxxxxxxxx>
>> Cc: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
>> Cc: Sagi Grimberg <sagi@xxxxxxxxxxx>
>> Cc: Vlastimil Babka <vbabka@xxxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> ---
>> Changelog:
>> v3: introduce a more common name sendpage_ok() for the open coded check
>> v2: fix typo in patch subject.
>> v1: the initial version.
>>
>>   drivers/nvme/host/tcp.c | 13 +++++++++++--
>>   include/linux/net.h     |  2 ++
>>   2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 79ef2b8e2b3c..f9952f6d94b9 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -887,8 +887,17 @@ static int nvme_tcp_try_send_data(struct
>> nvme_tcp_request *req)
>>           else
>>               flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
>>   -        /* can't zcopy slab pages */
>> -        if (unlikely(PageSlab(page))) {
>> +        /*
>> +         * e.g. XFS meta- & log-data is in slab pages, or bcache meta
>> +         * data pages, or other high order pages allocated by
>> +         * __get_free_pages() without __GFP_COMP, which have a
>> page_count
>> +         * of 0 and/or have PageSlab() set. We cannot use send_page for
>> +         * those, as that does get_page(); put_page(); and would cause
>> +         * either a VM_BUG directly, or __page_cache_release a page that
>> +         * would actually still be referenced by someone, leading to
>> some
>> +         * obscure delayed Oops somewhere else.
>> +         */
> 
> I was hoping that this comment would move to the helper as well.
> 

Sure, I will do that.


> Agree with Christoph comment as well.

I will move the inline sendpage_ok() to a separated patch.

Coly Li



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux