Re: [PATCH 1/2] nvme-tpc: don't use sendpage for pages not taking reference counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christoph,

Could you please take a look at this patch ? I will post a v3 patch soon
for your review.

Thanks in advance.

Coly Li

On 2020/7/10 21:26, Coly Li wrote:
> Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to
> send slab pages. But for pages allocated by __get_free_pages() without
> __GFP_COMP, which also have refcount as 0, they are still sent by
> kernel_sendpage() to remote end, this is problematic.
> 
> When bcache uses a remote NVMe SSD via nvme-over-tcp as its cache
> device, writing meta data e.g. cache_set->disk_buckets to remote SSD may
> trigger a kernel panic due to the above problem. Bcause the meta data
> pages for cache_set->disk_buckets are allocated by __get_free_pages()
> without __GFP_COMP.
> 
> This problem should be fixed both in upper layer driver (bcache) and
> nvme-over-tcp code. This patch fixes the nvme-over-tcp code by checking
> whether the page refcount is 0, if yes then don't use kernel_sendpage()
> and call sock_no_sendpage() to send the page into network stack.
> 
> The code comments in this patch is copied and modified from drbd where
> the similar problem already gets solved by Philipp Reisner. This is the
> best code comment including my own version. 
> 
> Signed-off-by: Coly Li <colyli@xxxxxxx>
> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@xxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxx>
> Cc: Jan Kara <jack@xxxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>
> Cc: Mikhail Skorzhinskii <mskorzhinskiy@xxxxxxxxxxxxxx>
> Cc: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
> Cc: Sagi Grimberg <sagi@xxxxxxxxxxx>
> Cc: Vlastimil Babka <vbabka@xxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
>  drivers/nvme/host/tcp.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 79ef2b8e2b3c..faa71db7522a 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -887,8 +887,17 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req)
>  		else
>  			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
>  
> -		/* can't zcopy slab pages */
> -		if (unlikely(PageSlab(page))) {
> +		/*
> +		 * e.g. XFS meta- & log-data is in slab pages, or bcache meta
> +		 * data pages, or other high order pages allocated by
> +		 * __get_free_pages() without __GFP_COMP, which have a page_count
> +		 * of 0 and/or have PageSlab() set. We cannot use send_page for
> +		 * those, as that does get_page(); put_page(); and would cause
> +		 * either a VM_BUG directly, or __page_cache_release a page that
> +		 * would actually still be referenced by someone, leading to some
> +		 * obscure delayed Oops somewhere else.
> +		 */
> +		if (unlikely(PageSlab(page) || page_count(page) < 1)) {
>  			ret = sock_no_sendpage(queue->sock, page, offset, len,
>  					flags);
>  		} else {
> 




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux