Re: fio-based responsiveness test for MMTests

Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> · Mon, 9 Oct 2017 09:45:13 +0100

On Fri, Oct 06, 2017 at 06:42:24PM +0200, Paolo Valente wrote:
> Hi Mel,
> I have been thinking of our (sub)discussion, in [1], on possible tests
> to measure responsiveness.
> 
> First let me sum up that discuss in terms of the two main facts that
> we highlighted.
> 
> On one side,
> - it is actually possible to measure the start-up time of some popular
> applications automatically and precisely (my claim),

Agreed, albeit my understanding that this is mainly due to using manual
testing, looking at the screen and a stopwatch.

> - but to accomplish such a task one needs a desktop environment, which
> is not available and/or not so easy to handle on a battery of
> server-like test machines;
> 

Also agreed and it's not something that scales. It's highly subjective
although I'm aware of anecdotal evidence that the desktop experience is
indeed better than CFQ.

> On the other side,
> - you did perform some tests to estimate responsiveness,

Not exactly. For the most part I was concerned with server-class workloads
in general and not responsiveness in particular or application startup
times. If nothing else, there is often a tradeoff between response times
for a particular IO request and overall throughpug and it's a balance. The
mail you initially linked quoted results from a database simulator and
the initialisation step for it. The initialisation step is a very basic
IO pattern and so regressions there are a concern under the heading of
"if the basics are broken then the complex case probably is too".

Very broadly speaking, I'd be more than happy if the performance of such
workloads was within a reasonable percentage of CFQ and classify the rest
as a tradeoff, particularly if disabling low_latency is enough to get
performance within the noise.

> - but the workloads for which you measured latency, namely the I/O
> generated by a set of independent random readers, is rather too simple
> to be able to model the much more complex workloads generated by any
> non-trivial application while starting.  The latter, in fact, spawns
> or wakes up a set of processes that synchronize among each other, and
> that do I/O that varies over time, ranging from sequential to random
> with large block sizes.  In addition, not only the number of processes
> doing I/O, but also the total amount of I/O varies greatly with the
> type of the application.

Also agreed. However, in general I only rely on those fio configurations to
detect major problems in the IO scheduler. There are too many boot-to-boot
variances in the throughput and iops figures to make accurate conclusions
on the headline figures. For the most part, if I'm looking at those
configurations then I'm looking at the iostats to see if there are anomalies
in await times, queue sizes, merges, major starvations etc.

> In view of these contrasting facts, here is my proposal to have a
> feasible yet accurate responsiveness test in your MMTests suite: add a
> synthetic test like yours, i.e., in which the workload is generated
> using fio, but in which appropriate workloads are generated to mimic
> real application-start-up workloads.  In more detail, in which
> appropriate classes of workloads are generated, with each class
> modeling, in any of the above respect (locality of I/O, number of
> processes, total amount of I/O, ...), a popular type of application.
> I think/hope should be able to build these workloads accurately, after
> years of analysis of traces of the I/O generated by applications while
> starting.  Or, in any case, we can then discuss the workloads I would
> propose.
> 
> What do you think?
> 

If it can be done then sure. However, I'm not aware of a reliable
synthetic representation of such workloads. I also am not aware of a
synthetic methodology that can simulate both the IO pattern itself, the
think time of the application and crucially link the "think time" to when
IO is initiated but it's also been a long time since I looked. About the
closest I had in the past was generating patterns like you suggest and then
timing how long it took an X window to appear once an application started
and this was years ago. The effort was abandoned because the time for the
window to appear was irrelevant. What mattered was how long it took the
application to be ready for use. Evolution was a particular example that
eventually caused me to abandon the effort (that and IO performance was not
my primary concern at the time). Evolution displaed a window relatively
quickly but then had a tendency to freeze while opening inboxes which I
didn't find a means of automatically detecting that would scale.

-- 
Mel Gorman
SUSE Labs