On Fri, Oct 06, 2017 at 06:42:24PM +0200, Paolo Valente wrote: > Hi Mel, > I have been thinking of our (sub)discussion, in [1], on possible tests > to measure responsiveness. > > First let me sum up that discuss in terms of the two main facts that > we highlighted. > > On one side, > - it is actually possible to measure the start-up time of some popular > applications automatically and precisely (my claim), Agreed, albeit my understanding that this is mainly due to using manual testing, looking at the screen and a stopwatch. > - but to accomplish such a task one needs a desktop environment, which > is not available and/or not so easy to handle on a battery of > server-like test machines; > Also agreed and it's not something that scales. It's highly subjective although I'm aware of anecdotal evidence that the desktop experience is indeed better than CFQ. > On the other side, > - you did perform some tests to estimate responsiveness, Not exactly. For the most part I was concerned with server-class workloads in general and not responsiveness in particular or application startup times. If nothing else, there is often a tradeoff between response times for a particular IO request and overall throughpug and it's a balance. The mail you initially linked quoted results from a database simulator and the initialisation step for it. The initialisation step is a very basic IO pattern and so regressions there are a concern under the heading of "if the basics are broken then the complex case probably is too". Very broadly speaking, I'd be more than happy if the performance of such workloads was within a reasonable percentage of CFQ and classify the rest as a tradeoff, particularly if disabling low_latency is enough to get performance within the noise. > - but the workloads for which you measured latency, namely the I/O > generated by a set of independent random readers, is rather too simple > to be able to model the much more complex workloads generated by any > non-trivial application while starting. The latter, in fact, spawns > or wakes up a set of processes that synchronize among each other, and > that do I/O that varies over time, ranging from sequential to random > with large block sizes. In addition, not only the number of processes > doing I/O, but also the total amount of I/O varies greatly with the > type of the application. Also agreed. However, in general I only rely on those fio configurations to detect major problems in the IO scheduler. There are too many boot-to-boot variances in the throughput and iops figures to make accurate conclusions on the headline figures. For the most part, if I'm looking at those configurations then I'm looking at the iostats to see if there are anomalies in await times, queue sizes, merges, major starvations etc. > In view of these contrasting facts, here is my proposal to have a > feasible yet accurate responsiveness test in your MMTests suite: add a > synthetic test like yours, i.e., in which the workload is generated > using fio, but in which appropriate workloads are generated to mimic > real application-start-up workloads. In more detail, in which > appropriate classes of workloads are generated, with each class > modeling, in any of the above respect (locality of I/O, number of > processes, total amount of I/O, ...), a popular type of application. > I think/hope should be able to build these workloads accurately, after > years of analysis of traces of the I/O generated by applications while > starting. Or, in any case, we can then discuss the workloads I would > propose. > > What do you think? > If it can be done then sure. However, I'm not aware of a reliable synthetic representation of such workloads. I also am not aware of a synthetic methodology that can simulate both the IO pattern itself, the think time of the application and crucially link the "think time" to when IO is initiated but it's also been a long time since I looked. About the closest I had in the past was generating patterns like you suggest and then timing how long it took an X window to appear once an application started and this was years ago. The effort was abandoned because the time for the window to appear was irrelevant. What mattered was how long it took the application to be ready for use. Evolution was a particular example that eventually caused me to abandon the effort (that and IO performance was not my primary concern at the time). Evolution displaed a window relatively quickly but then had a tendency to freeze while opening inboxes which I didn't find a means of automatically detecting that would scale. -- Mel Gorman SUSE Labs