Re: Running a separate fio process for each disk?

Jens Axboe <axboe@xxxxxxxxx> · Thu, 3 Dec 2015 11:58:00 -0700

Perfect! Thanks for reporting and re-testing.

On 12/03/2015 11:54 AM, Akash Verma wrote:
Jens, I confirmed that the issue is not seen with the latest FIO (I used
version fio-2.2.12-15-gcdab).

On Tue, Nov 24, 2015 at 5:18 PM, Jens Axboe <axboe@xxxxxxxxx
<mailto:axboe@xxxxxxxxx>> wrote:

    No worries, I know this week is a bit more problematic than usual.
    I'll hold off on the new release until I know.

    On 11/24/2015 01:51 PM, Akash Verma wrote:

        Sorry for not getting back - I didn't get a chance to try the latest
        git, and I'm off on vacation soon; I'm ccing Michael and Caio who
        might have a chance to try it out before Thursday. Michael or Caio,
        could you try run the two things Jens asked (the cpuclock test using
        the FIO we've been currently using as well as the latest from
        Git; and
        the regular multi-process FIO run with the latest git)?

        On Tue, Nov 24, 2015 at 7:51 AM, Jens Axboe <axboe@xxxxxxxxx
        <mailto:axboe@xxxxxxxxx>> wrote:

            Did you try current -git yet? I think it should work for
            both scenarios.
            It's a silly bug, would be great to have confirmation that
            it's fixed. Then
            I'll spin a new release.

            On 11/20/2015 05:21 PM, Jens Axboe wrote:

                And finally, there's a potential fix, if you run commit
                99afcdb53dc3 or later. So please do try that as well, and
                see if that behaves any better for you.

                On 11/20/2015 05:03 PM, Jens Axboe wrote:

                    Hi,

                    OK, I see. Can you pull the latest -git, and then
                    run fio
                    --cpuclock-test on one of the boxes where you see
                    the issue? It should
                    have commit 5896d827e1e2 or later.

                    On Fri, Nov 20, 2015 at 3:20 PM, Akash Verma
                    <akashv@xxxxxxxxxx <mailto:akashv@xxxxxxxxxx>
                    <mailto:akashv@xxxxxxxxxx
                    <mailto:akashv@xxxxxxxxxx>>> wrote:

                          Hi Jens,
                          The issue is not seen with non-cpu clock
                    sources, or when using a
                          single process (with individual threads, the
                    only config I tried). We
                          only see the issue when using multiple
                    processes and the cpu clock
                          source.

                          On Fri, Nov 20, 2015 at 11:50 AM, Jens Axboe
                    <axboe@xxxxxxxxx <mailto:axboe@xxxxxxxxx>
                          <mailto:axboe@xxxxxxxxx
                    <mailto:axboe@xxxxxxxxx>>> wrote:
                           > On 11/20/2015 12:37 PM, Caio Villela wrote:
                           >>
                           >> Hello Allen and Jens,
                           >>
                           >> Sorry for the long output, this is just in
                    case you want the
                          details.
                           >> Here is a simple explanation for the
                    problem. I want to run a 15
                          minute
                           >> random write, using 1 Meg requests, and
                    measure throughput and
                          latency.
                           >> What seems to be the problem is that if
                    the test system has a
                    large
                           >> number of drives - the system that I am
                    testing here has 28
                    drives -
                           >> then the time accounting seems to go bad
                    for some of the
                    processes.
                           >> What you see below is that during the 15
                    minutes from start, all
                          disks
                           >> are getting hit the same, as they should.
                    Then, after 15
                          minutes, there
                           >> are 15 drives that are still running....
                    after 5 minutes over the
                           >> specified 15 minutes, there is still one
                    drive running. Then
                          looking at
                           >> the amount of IOs sent to each drive, the
                    ones that ran on that
                          excess
                           >> time have much more IOs. FIO still reports
                    that all drives ran
                          for 15
                           >> minutes, although some ran for more than
                    20 minutes.
                           >>
                           >> We will attempt to run a single process
                    instead of 28 instances
                          of FIO
                           >> to see if this goes away.
                           >
                           >
                           > Could you also check if adding
                    clocksource=gettimeofday makes any
                           > difference? This sounds very odd.
                           >
                           > Assuming this was run with fio -git?
                           >
                           >
                           > --
                           > Jens Axboe
                           >
                          > --
                          > To unsubscribe from this list: send the line
                    "unsubscribe fio" in
                          > the body of a message
                    tomajordomo@xxxxxxxxxxxxxxx
                    <mailto:tomajordomo@xxxxxxxxxxxxxxx>
                    <mailto:majordomo@xxxxxxxxxxxxxxx
                    <mailto:majordomo@xxxxxxxxxxxxxxx>>
                          > More majordomo info
                    athttp://vger.kernel.org/majordomo-info.html
                    <http://vger.kernel.org/majordomo-info.html>

            --
            Jens Axboe

    --
    Jens Axboe

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html