Re: [RFC 0/3] Add zero copy feature for tcmu

Bodo Stroesser <bostroesser@xxxxxxxxx> · Tue, 22 Mar 2022 15:05:38 +0100

On 22.03.22 14:17, Xiaoguang Wang wrote:
hi,

On 18.03.22 10:55, Xiaoguang Wang wrote:
The core idea to implement tcmu zero copy feature is really straight,
which just maps block device io request's sgl pages to tcmu user space
backstore, then we can avoid extra copy overhead between sgl pages and
tcmu internal data area(which really impacts io throughput), please see
https://www.spinics.net/lists/target-devel/msg21121.html for detailed
info.

Can you please tell us, how big the performance improvement is and
which configuration you are using for measurenments?
Sorry, I should have attached test results here. Initially I tried to use
tcmu user:fbo backstore to evaluate performance improvements, but
it only shows about 10%~15% io throughput improvement. Fio config
is numjobs=1, iodepth=8, bs=256k, which isn't very impressive. The
reason is that user:fbo backstore does buffered reads, it consumes most
of cpu.

Then I test this zero copy feature for our real workload, whose backstore
is a network program visiting distributed file system and it's 
multi-threaded.
For 4 job, 8 depth, 256 kb io size, the write throughput improves from
3.6GB/s to 10GB/s.

Thank you for the info. Sounds promising.

What fabric are you using? iSCSI?
What HW is your target running on?

Regards,
Xiaoguang Wang

Initially I use remap_pfn_range or vm_insert_pages to map sgl pages to
user space, but both of them have limits:
1)  Use vm_insert_pages
which is like tcp getsockopt(TCP_ZEROCOPY_RECEIVE), but there're two
restrictions:
   1. anonymous pages can not be mmaped to user spacea.
     ==> vm_insert_pages
     ====> insert_pages
     ======> insert_page_in_batch_locked
     ========> validate_page_before_insert
     In validate_page_before_insert(), it shows that anonymous page 
can not
     be mapped to use space, we know that if issuing direct io to block
     device, io request's sgl pages mostly comes from anonymous page.
         if (PageAnon(page) || PageSlab(page) || page_has_type(page))
             return -EINVAL;
     I'm not sure why there is such restriction? for safety reasons ?

   2. warn_on triggered in __folio_mark_dirty
     When calling zap_page_range in tcmu user space backstore when io
     completes, there is a warn_on triggered in __folio_mark_dirty:
        if (folio->mapping) {   /* Race with truncate? */
            WARN_ON_ONCE(warn && !folio_test_uptodate(folio));

     I'm not familiar with folio yet, but I think the reason is that 
when
     issuing a buffered read to tcmu block device, it's page cache 
mapped
     to user space, backstore write this page and pte will be 
dirtied. but
     initially it's newly allocated, hence page_update flag not set.
     In zap_pte_range(), there is such codes:
        if (!PageAnon(page)) {
            if (pte_dirty(ptent)) {
                force_flush = 1;
                set_page_dirty(page);
            }
    So this warn_on is reasonable.
    Indeed what I want is just to map io request sgl pages to tcmu user
    space backstore, then backstore can read or write data to mapped 
area,
    I don't want to care about page or its mapping status, so I 
choose to
    use remap_pfn_range.

2) Use remap_pfn_range()
   remap_pfn_range works well, but it has somewhat obvious overhead. 
For a
   512kb io request, it has 128 pages, and usually this 128 page's 
pfn are
   not consecutive, so in worst cases, for a 512kb io request, I'd 
need to
   issue 128 calls to remap_pfn_range, it's horrible. And in 
remap_pfn_range,
   if x86 page attribute table feature is enabled, lookup_memtype 
called by
   track_pfn_remap() also introduces obvious overhead.

Finally in order to solve these problems, Xu Yu helps to implment a new
helper, which accepts an array of pages as parameter, anonymous pages 
can
be mapped to user space, pages would be treated as special 
pte(pte_special
returns true), so vm_normal_page returns NULL, above folio warn_on won't
trigger.

Thanks.

Xiaoguang Wang (2):
   mm: export zap_page_range()
   scsi: target: tcmu: Support zero copy

Xu Yu (1):
   mm/memory.c: introduce vm_insert_page(s)_mkspecial

  drivers/target/target_core_user.c | 257 
+++++++++++++++++++++++++++++++++-----
  include/linux/mm.h                |   2 +
  mm/memory.c                       | 183 +++++++++++++++++++++++++++
  3 files changed, 414 insertions(+), 28 deletions(-)