Re: Trying to understand a weird timeout when using fio and multipath IO

Andrey Kuzmin <andrey.v.kuzmin@xxxxxxxxx> · Wed, 4 Nov 2015 12:10:56 +0300

On Wed, Nov 4, 2015 at 12:22 AM, Todd Lawall <tlawall@xxxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> I'm trying to understand a behavior, and I am hoping to further my
> understanding of what Fio is doing.  In the specific case in question,
> I'm seeing a seven minute wait in resuming IO after a failover.  In other
>  variations on this job file, the seven minute wait disappears, and it
>  drops back down to the 40 second wait that I see with usual IO
> loads that are run.
>
> The setup:
> - I have one Windows 2012 R2 host, with two NICs.
> - I have one storage array, with 2 controllers A and B, with 2 10GbE ports
>    for each side, making 4 ports, with failover capability between the two
>    sides.
> - I have iSCSI and MPIO setup so that there is one login from each NIC to
>    each side, so four sessions total for each volume.   The map looks
>    something like this:
>
>            nic1                 nic2
>            /  \                 /  \
>           /    \               /    \
>      side A   side B       side A  side B
>      port 0   port 0       port 1  port 1
>
> - I have the fio job below.  It is basically 256k blocks, 1 iodepth, one
>    worker with 48 drives.
>
> [global]
> do_verify=0
> ioengine=windowsaio
> numjobs=1
> iodepth=1
> offset=0
> direct=1
> thread
>
> [fio-0]
> blocksize=256k
> readwrite=rw
> filename=\\.\PHYSICALDRIVE19
> filename=\\.\PHYSICALDRIVE20
> <snipped out the other 44 drives>
> filename=\\.\PHYSICALDRIVE13
> filename=\\.\PHYSICALDRIVE14
> size=100%
>

If my understanding of how multi-pathing works doesn't fool me, you
problem with the above setup is fairly straightforward. You are
running an asynchronous engine with single I/O queue depth
single-threaded against multiple files, which means running
synchronous I/O against one file at a time (FIO switches files for you
round-robin, unless specifically instructed to do it any other way).

So when a path fails, your job does roughly 11 path failovers
(slightly more, depending on whether a device the current I/O was
running against is on the failed path) one by one, which should take
11x40s = 440 s or roughly 7 minutes.

> If I alter the job in any of the following ways, IO keeps going after the
>  failover period which is about 40 seconds.  To summarize:
>
> Doesn't work:
>  - multiple disks, single job, 1 iodepth
>
> Works:
>  - Single disk, one job, 1 iodepth
>  - multiple disks, one job with all disks, same iodepth as # of disks (e.g.
>    if there's 48 disks, iodepth is set to 48)
>  - multiple disks, one job per disk, 1 iodepth

All above setups differ from the original in that failover for each
drive run in parallel, taking roughly 40 s.

Hope this helps,
Andrey

>
> Would anyone have any idea why that one arrangement causes a
> significant delay before IO is resumed?
>
> Thanks in advance,
> Todd
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html