RFC: sg driver addition: SG_FLAG_SHARED_MMAP_IO

Douglas Gilbert <dougg@xxxxxxxxxx> · Wed, 21 Mar 2007 22:37:09 -0400

I mentioned this idea a few weeks ago on this list: namely
to allow a sg pass-through request to use the mmap-ed
reserve buffer associated with another sg file descriptor.

In my experience mmap-ed IO using sg's reserve buffer mapped
into the user space is faster than direct IO schemes. However
one shortcoming is that if you try to copy between two devices
using this technique then you end up with two separate mmap-ed
buffers in the user space program. Then the user space program
needs to copy between the two buffers which would defeat much
of the advantage of the mmap-ed IO. You could (and sgm_dd in
sg3_utils does) use mmap-ed IO on the read side and direct IO
on the write side (or vice versa).

I used the sg driver as found in lk 2.6.21-rc4 as a baseline
(and I don't think sg has changed since 2.6.19). A gzipped
diff is attached. There is also some test code (a modified
sgm_dd) in the sg3_utils-1.24 beta on the www.torque.net/sg site.

Here is an example of a disk to disk copy:
  sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=smmap bs=512

The new flag is 'oflag=smmap' which instructs the write SG_IO
on /dev/sg1 to set SG_FLAG_SHARED_MMAP_IO and it passes
the mmap-ed buffer used for /dev/sg0 in dxferp. [Add a
'verbose=1' option and it will indicate how many times shared
mmap IO was requested and how many times it was actually done.]

Features:
  - allow both side of a copy like operation to dma into
    and out of the same user space buffer
  - minimal per command overhead (i.e. building of
    scatter gather lists and pinning pages)
  - could copy a single source to multiple destinations
    efficiently
  - if shared reserve buffer unavailable (or not big
    enough) then fall back to indirect IO transparently
  - new info bit SG_INFO_SHARED_MMAP_IO indicates whether
    shared mmap-ed IO was done

Restrictions (enforced by the sg driver):
  - confined to file descriptors in the same process
  - there can be only one user of a reserve buffer
    at a time
  - low_dma is honoured

Complexity
  - it does have a few more corner cases than usual. For
    example in above sgm_dd invocation: closing /dev/sg0
    while /dev/sg1 is sharing its mmap-ed reserve buffer ...

Here are some timings copying between two ramdisks. It is
assumed the 'bs=8k' given to dd is equivalent to 'bs=512
bpt=16' given to sgm_dd.

# lsscsi -g
[4:0:0:0]    disk    Linux    scsi_debug       1.82  /dev/sda  /dev/sg0
[5:0:0:0]    disk    Linux    scsi_ses         1.06  /dev/sdb  /dev/sg1

# ./dd_tsts.sh
Usage: dd_tsts.sh <ifile> <ofile> <times> <bs>

# ./dd_tsts.sh /dev/sda /dev/sdb 50 8k
Indirect IO with dd
dd if=/dev/sda of=/dev/sdb bs=8k
real    0m7.448s
user    0m0.080s
sys     0m7.046s

Direct IO with dd
dd if=/dev/sda iflag=direct of=/dev/sdb oflag=direct bs=8k
real    0m4.529s
user    0m0.114s
sys     0m3.799s

# ./sg_dd_tsts.sh /dev/sg0 /dev/sg1 50 16
Indirect IO with sg_dd
sg_dd if=/dev/sg0 of=/dev/sg1 bs=512 bpt=16
real    0m6.304s
user    0m0.171s
sys     0m5.268s

Direct IO with sg_dd
sg_dd if=/dev/sg0 iflag=dio of=/dev/sg1 oflag=dio bs=512 bpt=16
real    0m4.246s
user    0m0.135s
sys     0m3.395s

Mmap read, indirect IO write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 bs=512 bpt=16
real    0m4.023s
user    0m0.127s
sys     0m3.259s

Mmap read, direct IO write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=dio bs=512 bpt=16
real    0m4.057s
user    0m0.164s
sys     0m3.264s

Mmap read, shared mmap write with sgm_dd
sgm_dd if=/dev/sg0 of=/dev/sg1 oflag=smmap bs=512 bpt=16
real    0m3.871s
user    0m0.131s
sys     0m3.111s

Don't expect drastic improvements in real IO unless it is
in the gigabyte per second range.

Doug Gilbert
Attachment:
sg2621rc4smm2.diff.gz

Description: GNU Zip compressed data