Hi Christoph, Could you please take a look at this patch ? I will post a v3 patch soon for your review. Thanks in advance. Coly Li On 2020/7/10 21:26, Coly Li wrote: > Currently nvme_tcp_try_send_data() doesn't use kernel_sendpage() to > send slab pages. But for pages allocated by __get_free_pages() without > __GFP_COMP, which also have refcount as 0, they are still sent by > kernel_sendpage() to remote end, this is problematic. > > When bcache uses a remote NVMe SSD via nvme-over-tcp as its cache > device, writing meta data e.g. cache_set->disk_buckets to remote SSD may > trigger a kernel panic due to the above problem. Bcause the meta data > pages for cache_set->disk_buckets are allocated by __get_free_pages() > without __GFP_COMP. > > This problem should be fixed both in upper layer driver (bcache) and > nvme-over-tcp code. This patch fixes the nvme-over-tcp code by checking > whether the page refcount is 0, if yes then don't use kernel_sendpage() > and call sock_no_sendpage() to send the page into network stack. > > The code comments in this patch is copied and modified from drbd where > the similar problem already gets solved by Philipp Reisner. This is the > best code comment including my own version. > > Signed-off-by: Coly Li <colyli@xxxxxxx> > Cc: Chaitanya Kulkarni <chaitanya.kulkarni@xxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxx> > Cc: Jan Kara <jack@xxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx> > Cc: Mikhail Skorzhinskii <mskorzhinskiy@xxxxxxxxxxxxxx> > Cc: Philipp Reisner <philipp.reisner@xxxxxxxxxx> > Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > drivers/nvme/host/tcp.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > index 79ef2b8e2b3c..faa71db7522a 100644 > --- a/drivers/nvme/host/tcp.c > +++ b/drivers/nvme/host/tcp.c > @@ -887,8 +887,17 @@ static int nvme_tcp_try_send_data(struct nvme_tcp_request *req) > else > flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST; > > - /* can't zcopy slab pages */ > - if (unlikely(PageSlab(page))) { > + /* > + * e.g. XFS meta- & log-data is in slab pages, or bcache meta > + * data pages, or other high order pages allocated by > + * __get_free_pages() without __GFP_COMP, which have a page_count > + * of 0 and/or have PageSlab() set. We cannot use send_page for > + * those, as that does get_page(); put_page(); and would cause > + * either a VM_BUG directly, or __page_cache_release a page that > + * would actually still be referenced by someone, leading to some > + * obscure delayed Oops somewhere else. > + */ > + if (unlikely(PageSlab(page) || page_count(page) < 1)) { > ret = sock_no_sendpage(queue->sock, page, offset, len, > flags); > } else { >