Re: tcmu data area double copy overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08.12.21 13:43, Xiaoguang Wang wrote:
hi,

I'm a newcomer to tcmu or iscsi subsystem, and have spent several days to learn
iSCSI and tcmu, so if my question looks fool, forgive me :)

One of our customers is using tcmu to visit remote distributed filesystem and finds that there's obvious copy overhead in tcmu while doing read operations, so I spent time to find the reason and see whether can optimize a bit. Now according to my understanding about tcmu kernel codes, tcmu allocates internal data pages for data area, use these data pages as temporary storage between user-space backstore and tcmu. For iSCSI initiator's write request, tcmu first copy sg page's content to internal data pages, then user-space backstore moves mmaped data area for these data pages to backstore; for iSCSI initiator's read request, tcmu also allocates internal data pages, backstore copies distributed filesystem's data to these data pages, later tcmu copy data pages' content to sg's pages. That means for both read and write requests, it
exists one extra data copy.

So my question is that whether we don't allocate internal data pages in tcmu, just make sg's pages to be mmaped in data area, so we can reduce one extra copy, which I think it can improve throughput. Or is there any special security issues that we can not do
this way? Thanks.

You are right, tcmu currently copies data between the sg-pages and tcmu
data pages.

But I'm not sure the solution you suggest would really show the improved
throughput you expect, because we would have to map all data pages of the
sgl(s) of a new cmd into user space and unmap them again when the cmd is
processed.

To map one page means, that we store the struct page pointer in tcmu's
data (xarray). If userspace tries to read or write that page, a page fault
will occur and kernel will call tcmu_vma_fault which returns the page
pointer. To unmap means that tcmu has remove the page pointer and to call
unmap_mapping_range. So I'm not sure that copying content of one page is
more expensive than mapping and unmapping one page.

Additionally, if tcmu would map the sg-pages, it would have to unmap the
pages immediately when userspace completes the cmd, because tcmu is not
the owner of the pages. So the recently added feature "KEEP_BUF" would
have to be removed again. But that feature was added to avoid the need for
data copy in userspace in some situations.

Finally, if tcmu times out a cmd that is waiting on the ring for
completion from userspace, tcmu sends cmd completion to tcm core. Before
doing so, it would have to unmap the sg-pages. If userspace later tries to
access one of these pages, tcmu_vma_fault has nothing to map, instead
returns VM_FAULT_SIGBUS and userspace receives SIGBUS.

I already started another attempt to avoid data copy in tcmu. The idea
is to optionally allow backend drivers to have callbacks for sg allocation
and free. That way the pages in a sg allocated by tcm core can be pages
from tcmu's data area. Thus, no map/unmap is needed and the fabric driver
directly writes/reads data to/from those pages visible to userspace.

In a high performance scenario the method already lowers cpu load and
enhances throughput very well with qla2xxx fabric. Unfortunately that
patchset works only for fabrics using target_submit_cmd or calling
target_submit_prep without allocated sgls, which iscsi does not :(

Currently I'm working on another tuning measure in tcmu. After that I'll
go back to my no-data-copy patches. Maybe I can make them work with most
fabric drivers including iscsi.

Regards,
Bodo



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux