Re: How to do strict synchronous i/o on Windows?

Greg Sullivan <greg.sullivan@xxxxxxxxxxxxx> · Thu, 16 Aug 2012 00:07:15 +1000

On 15 August 2012 21:31, Martin Steigerwald <Martin@xxxxxxxxxxxx> wrote:
> Am Mittwoch, 15. August 2012 schrieb Greg Sullivan:
>> On 15/08/2012, Martin Steigerwald <Martin@xxxxxxxxxxxx> wrote:
>> > Am Mittwoch, 15. August 2012 schrieb Greg Sullivan:
>> >> On 15 August 2012 07:24, Martin Steigerwald <Martin@xxxxxxxxxxxx> wrote:
>> >> > Am Dienstag, 14. August 2012 schrieb Greg Sullivan:
>> >> > > On 15/08/2012, Martin Steigerwald <Martin@xxxxxxxxxxxx> wrote:
>> >> > > > Am Dienstag, 14. August 2012 schrieb Greg Sullivan:
>> >> > > >> On 15/08/2012, Martin Steigerwald <Martin@xxxxxxxxxxxx> wrote:
>> >> > > >> > Am Dienstag, 14. August 2012 schrieb Greg Sullivan:
>> >> > > >> >> On 15 August 2012 03:36, Martin Steigerwald
>> >> > > >> >> <Martin@xxxxxxxxxxxx>
>> >> > > >> >>
>> >> > > >> >> wrote:
>> >> > > >> >> > Hi Greg,
>> >> > > >> >
>> >> > > >> > […]
>> >> > > >> >
>> >> > > >> >> > Am Dienstag, 14. August 2012 schrieb Greg Sullivan:
>> >> > > >> >> >> On Aug 14, 2012 11:06 PM, "Jens Axboe"
>> >> > > >> >> >> <axboe@xxxxxxxxx>
>> >
>> > wrote:
>> >> > > >> >> >> > On 08/14/2012 08:24 AM, Greg Sullivan wrote:
>> >> > […]
>> >> >
>> >> > > >> >> Is it possible to read from more than file in a single
>> >> > > >> >> job, in a round-robin fashion? I tried putting more than
>> >> > > >> >> one file in a single job, but it only opened one file. If
>> >> > > >> >> you mean to just do random reads in a single file - I've
>> >> > > >> >> tried that, and the throughput is unrealistically low. I
>> >> > > >> >> suspect it's because the read-ahead buffer cannot be
>> >> > > >> >> effective for random accesses.  Of course, reading
>> >> > > >> >> sequentially from a single file will result in a
>> >> > > >> >> throughput that is far too high to simulate the
>> >> > > >> >> application.
>> >> > > >> >
>> >> > > >> > Have you tried
>> >> > > >> >
>> >> > > >> >        nrfiles=int
>> >> > > >> >
>> >> > > >> >               Number of files to use for this job.
>> >> > > >> >               Default: 1.
>> >> > > >> >
>> >> > > >> >        openfiles=int
>> >> > > >> >
>> >> > > >> >               Number of files to keep open at the same
>> >> > > >> >               time. Default: nrfiles.
>> >> > > >> >
>> >> > > >> >        file_service_type=str
>> >> > > >
>> >> > > > […]
>> >> > > >
>> >> > > >> > ? (see fio manpage).
>> >> > > >> >
>> >> > > >> > It seems to me that all you need is nrfiles. I´d bet that
>> >> > > >> > fio distributes
>> >> > > >> > the I/O size given among those files, but AFAIR there is
>> >> > > >> > something about
>> >> > > >> > that in fio documentation as well.
>> >> > > >> >
>> >> > > >> > Use the doc! ;)
>> >> > > >
>> >> > > > […]
>> >> > > >
>> >> > > >> Yes, I have tried all that, and it works, except that it
>> >> > > >> causes disk queuing, as I stated in my first post. I thought
>> >> > > >> you meant to put all the files into a single [job name]
>> >> > > >> section of the ini file, to enforce single threaded io.
>> >> > > >
>> >> > > > With just one job running at once?
>> >> > > >
>> >> > > > Can you post an example job file?
>> >> > > >
>> >> > > > Did you try the sync=1 / direct=1 suggestion from Bruce Chan?
>> >> > > >
>> >> > > > I only know the behaviour of fio on Linux where I/O depth of
>> >> > > > greater than one is only possible with libaio and direct=1.
>> >> > > > The manpage hints at I/O depth is one for all synchronous I/O
>> >> > > > engines, so I´d bet that refers to Windows as well.
>> >> > > >
>> >> > > > Other than that I have no idea.
>> >> >
>> >> > […]
>> >> >
>> >> > > One INI file, but a seperate [job name] section for each file,
>> >> > > yes. According to Jens, because each [job name] is a seperate
>> >> > > thread, and iodepth acts at the thread level, there will still
>> >> > > be queuing at the device level. If there were a way to do what
>> >> > > I want I think Jens would have told me, unfortunately.   ;)
>> >> > >
>> >> > > direct io does at least allow me to do cache-less reads though -
>> >> > > thankyou.
>> >> >
>> >> > My suggestion is to use one job with several files.
>> >> >
>> >> > martin@merkaba:/tmp> cat severalfiles.job
>> >> > [global]
>> >> > size=1G
>> >> > nrfiles=100
>> >> >
>> >> > [read]
>> >> > rw=read
>> >> >
>> >> > [write]
>> >> > stonewall
>> >> > rw=write
>> >> >
>> >> > (now these are two jobs, but stonewall lets the write job run
>> >> > after the read one with cache invalidation if not disabled and if
>> >> > supported by OS)
>> >> >
>> >> > martin@merkaba:/tmp> fio severalfiles.job
>> >> > read: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
>> >> > write: (g=1): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
>> >> > 2.0.8
>> >> > Starting 2 processes
>> >> > read: Laying out IO file(s) (100 file(s) / 1023MB)
>> >> > write: Laying out IO file(s) (100 file(s) / 1023MB)
>> >> > Jobs: 1 (f=100)
>> >> > read: (groupid=0, jobs=1): err= 0: pid=23377
>> >> > [… lots of fast due to /tmp being a RAM-based filesystem – tmpfs
>> >> > …]
>> >> >
>> >> >
>> >> > martin@merkaba:/tmp> ls -lh read.1.* | head
>> >> > -rw-r--r-- 1 martin martin 11M Aug 14 23:15 read.1.0
>> >> > -rw-r--r-- 1 martin martin 11M Aug 14 23:15 read.1.1
>> >
>> > […]
>> >
>> >> > [… only first ten displayed …]
>> >> >
>> >> > martin@merkaba:/tmp> find -name "read.1*" 2>/dev/null | wc -l
>> >> > 100
>> >> >
>> >> > 100 files a 11M, due to rounding issues that may nicely add up to
>> >> > the one GiB.
>> >> >
>> >> > Raw sizes are:
>> >> >
>> >> > martin@merkaba:/tmp> ls -l read.1.* | head
>> >> > -rw-r--r-- 1 martin martin 10737418 Aug 14 23:20 read.1.0
>> >> > -rw-r--r-- 1 martin martin 10737418 Aug 14 23:20 read.1.1
>> >
>> > […]
>> >
>> >> > Note: When I used filename, fio just created one files regardless
>> >> > of the nrfiles setting. I would have expected it to use the
>> >> > filename as a prefix. There might be some way to have it do that.
>> >> >
>> >> > Ciao,
>> >>
>> >> Thanks - that runs, but it's still queuing. As I said before, I
>> >> can't use the sync engine - I receive an error. Is there a
>> >> synchronous engine available for Windows? Perhaps that's the only
>> >> problem. Can you check to see whether your system is queuing at the
>> >> file system/device level when you run that test?
>> >>
>> >> I had attempted to put the files in a single job earlier - I think
>> >> it may have been successfully accessing both files, but I didn't
>> >> notice it in the output. I'm a raw beginner.
>> >
>> > Did you try with
>> >
>> > ioengine=windowsaio
>> >
>> > +
>> >
>> > iodepth=1 (should be default however I think)
>> >
>> >
>> > Otherwise I have no idea. I never used fio on Windows so far.
>> >
>> > It might help when you try to explain exactly which problem you want
>> > to solve by the fio measurements. Multimedia streaming. Is it to
>> > slow? What is
>> >
>> > it why you want to do these measurements?
>>
>> They are both defaults, and the output shows that both are being used.
>> If you could tell me whether your system is generating queuing it
>> would help, because if yours queues even when using the sync io
>> engine, it means I'm wasting my time and fio simply needs to be
>> augmented to support strict single threaded operation over multiple
>> files.
>>
>> I am wanting to determine whether the application in question is
>> extracting a reasonable number of real time streams from any given
>> storage system.
>
> Just for the record since you got it working on Windows as well – it works for me:
>
> merkaba:/tmp> cat severalfiles.job
> [global]
> size=1G
> nrfiles=100
>
> [read]
> rw=read
>
> merkaba:/tmp> fio severalfiles.job
> read: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> 2.0.8
> Starting 1 process
>
> read: (groupid=0, jobs=1): err= 0: pid=4579
>   read : io=1023.9MB, bw=2409.7MB/s, iops=616705 , runt=   425msec
>     clat (usec): min=0 , max=54 , avg= 1.08, stdev= 0.64
>      lat (usec): min=0 , max=54 , avg= 1.13, stdev= 0.66
>     clat percentiles (usec):
>      |  1.00th=[    0],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1],
>      | 30.00th=[    1], 40.00th=[    1], 50.00th=[    1], 60.00th=[    1],
>      | 70.00th=[    1], 80.00th=[    1], 90.00th=[    1], 95.00th=[    2],
>      | 99.00th=[    2], 99.50th=[    2], 99.90th=[   14], 99.95th=[   16],
>      | 99.99th=[   23]
>     lat (usec) : 2=92.14%, 4=7.74%, 10=0.01%, 20=0.10%, 50=0.01%
>     lat (usec) : 100=0.01%
>   cpu          : usr=22.41%, sys=76.42%, ctx=421, majf=0, minf=36
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued    : total=r=262100/w=0/d=0, short=r=0/w=0/d=0
>
> Run status group 0 (all jobs):
>    READ: io=1023.9MB, aggrb=2409.7MB/s, minb=2409.7MB/s, maxb=2409.7MB/s, mint=425msec, maxt=425msec
>
> (if you wonder about the figures – thats RAM testing – Linux tmpfs ;)
>
> 100% at IO depth 1.
>
> Ciao,
> --
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Are you sure that the "100% at IO depth 1" actually means that there
is no queuing at the device? I.e - doesn't that IO depth simply mean
that each thread reached an IO depth of 1, without necessarily
referring to the disk queue?  I'm actually checking my physical disk
queue. Before I was using direct io, the io depths were 100% at 1 just
like yours, but ios were being queued at the device level. I now have
confidence that it probably is working for you, of course, but I'm not
yet convinced that you are doing the correct checking, that's all. ;^)

Greg.
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html