Re: fileio + wb cache streaming write performance on raid5

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Wed, 26 Jun 2013 17:25:47 -0700

On Wed, 2013-06-26 at 12:55 +0300, Vladislav Bogdanov wrote:
> 26.06.2013 11:48, Nicholas A. Bellinger wrote:
> > Hi Vladislav,
> >
> > On Mon, 2013-06-24 at 18:18 +0300, Vladislav Bogdanov wrote:
> >> Hi,
> >>
> >> I'm evaluating performance of different targets (actually LIO and IET)
> >> on top of RAID5
> >> (mdraid) for my customer.
> >>

<SNIP>

> >> What I see:
> >>
> >> IET (with fileio+wb) shows:
> >>
> >>   * 75 MB/s with kernel 3.4 (from debian)
> >>   * 85 MB/s with kernel 3.9
> >>
> >> LIO (with fileio+wb) shows:
> >>
> >>   * 63 MB/s with kernel 3.4 (from debian)
> >>   * 54 MB/s with kernel 3.9
> >>
> >> Is there any explanation for LIO performance degradation with the kernel
> >> upgrade?
> >>
> 
> My fault, that is 3.2.41, not 3.4.
> 
> > Strange.  Can you verify using a TPG attribute default_cmdsn_depth value
> > larger than the hardcoded default of 16..?
> >
> > IIRC, IET is using a larger CmdSN window by default here, so you'll want
> > to increase default_cmdsn_depth=128 with this type of workload.
> Already tried that, that was the first suspect. Unfortunately no luck.
> 

<nod>, thanks for verifying that bit.

> 
> Some more observations:
> With IET iostat on a target host shows much smoother picture, less than
> 10Mb/s
> peaks (from the median).
> With LIO peaks are much bigger, looks like something forces IO (many
> partial stripes)
> to be flushed at the improper point of time.
> 
> The same is seen on the initiator side, robocopy shows percentage
> progress while
> copying, and with IET it goes very smooth. With LIO that progress is
> some-how "jaggy".
> 
> I see that IET does flushes itself, while LIO leaves that to other
> kernel subsystem
> (or at least I didn't find where it calls flush).
> May that be a point?
> 

Not exactly.  They both support explicit flushing via SYNCHRONIZE_CACHE
when FILEIO writeback mode is enabled, or the implicit background
bdflush write-out via /proc/sys/vm/dirty_[bytes,expire_centisecs], etc
tunables.

The one thing that IET does not support that could be having an effect
here is FUA (forced unit access) WRITEs.  This allows the initiator to
explicitly say when a given WRITE should use write-through schematics,
which IET currently ignores. (See target_disk.c:build_write_response)

By default LIO FILEIO devices uses emulate_fua_write=1, and honor FUA
WRITEs by calling vfs_fsync_range() on the requested WRITE blocks
immediately after I/O submission. (See target_core_file.c:fd_execute_rw)

This can be disabled on the /backstores/fileio/$DEV in question via
'set attribute emulate_fua_write=0'.  On the client side you'll likely
need to re-register the LUN before the changes take effect.

> > Also, verifying with a RAMDISK_MCP backend on the same setup would be
> > useful for determining if it's a FILEIO specific performance issue.
> ramdisk works at the wire speed both with RAMDISK_MCP and loop device on
> a tmpfs
> (with both iblock and fileio).

Thanks for verifying this as well..

> And, I wouldn't say it is sole FILEIO problem, but problem of iSCSI +
> mdRAID[56].
> I already spent much time on this, and it seems that IET somehow almost
> guaranties
> that with fileio+wb only complete stripes are put on the media with this
> type of load,
> while all other variants (IET with fileio+wt, IET with blockio, LIO with
> fileio (wt of wb),
> LIO with iblock) do partial stripe writes, which are very expensive for
> RAID5/6.
> 
> Another point may be that mdraid assumes that there is always a local
> filesystem on
> top of it, which is not a case with iSCSI.
> 
> But, again, IET magically does the trick - 85Mb/s is very close to both
> wire speed and to
> expected maximal raid5 write speed when IO size is equal to stripe size,
> so writing the
> full stripe costs only 4 IOs (2 reads and 2 writes).
> 

Ok, this sounds more and more like FUA WRITEs are forcing write-through,
and generating partial stripe writes to occur.

Can you disable emulate_fua_write=0 following the above and retry..?

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html