Hello, On Mon, Jun 02, 2014 at 11:26:07AM +0200, Paolo Valente wrote: > >> #define cond_for_expiring_non_wr (bfqd->hw_tag && \ > >> - bfqd->wr_busy_queues > 0) > >> + (bfqd->wr_busy_queues > 0 || \ > >> + (symmetric_scenario && \ > >> + blk_queue_nonrot(bfqd->queue)))) > > > > expire_non_wr = zzz; > > > > The solution you propose is the first that came to my mind. But then > I went for a clumsy macro-based solution because: 1) the whole > function is all about evaluating a long logical expression, 2) the > macro-based solution allows the short-circuit to be used at best, > and the number of steps to be minimized. For example, with async > queues, only one condition is evaluated. > > Defining three variables entails instead that the value of all the > variables is computed every time, even if most of the times there is > no need to. > > Would this gain be negligible (sorry for my ignorance), or would > not it be however enough to justify these unusual macros? The compiler should be able to optimize those to basically the same code. AFAICS, everything the code tests is trivially known to be without side-effect to the compiler. Besides, even if the compiler generates slightly less efficient code, which it shouldn't, it's highly unlikely that this level of micro CPU cycle optimization would be measureable for something as heavy as [bc]fq. > > This optimization may be theoretically interesting but doesn't seem > > practical at all and would make the sytem behave distinctively > > differently depending on something which is extremely subtle and seems > > completely unrelated. Furthermore, on any system which uses blkcg, > > ext4, btrfs or has any task which has non-zero nice value, it won't > > make any difference. Its value is only theoretical. > > Turning on idling unconditionally when blkcg is used, is one of the > first solutions we have considered. But there seem to be practical > scenarios where this would cause an unjustified loss of > throughput. The main example for us was ulatencyd, which AFAIK > creates one group for each process and, by default, assigns to all > processes the same weight. But the assigned weight is not the one > associated to the default ioprio. Isn't the optimization "not idling" when these conditions are met? Shouldn't the comparison be against the benefit of "not idling selectively" vs "always idling" when blkcg is in use? Another problem there is that this not only depends on the number of processes but the number of threads in it. cgroup is moving away from allowing threads of a single process in different cgroups, so this means that the operation can fluctuate in a very unexpected manner. I'm not really convinced about the approach. With rotating disks, we know that allowing queue depth > 1 generaly lowers both throughput and responsiveness and brings benefits in quite restricted cases. It seems rather backwards to always allow QD > 1 and then try to optimize in an attempt to recover what's lost. Wouldn't it make far more sense to actively maintain QD == 1 by default and allow QD > 1 in specific cases where it can be determined to be more beneficial than harmful? > I do not know how widespread a mechanism like ulatencyd is > precisely, but in the symmetric scenario it creates, the throughput > on, e.g., an HDD would drop by half if the workload is mostly random > and we removed the more complex mechanism we set up. Wouldn't this > be bad? It looks like a lot of complexity for optimization for a very specific, likely unreliable (in terms of its triggering condition), use case. The triggering condition is just too specific. > > Another thing to consider is that virtually all remotely modern > > devices, rotational or not, are queued. At this point, it's rather > > pointless to design one behavior for !queued and another for queued. > > Things should just be designed for queued devices. > > I am sorry for expressing doubts again (mainly because of my > ignorance), but a few months ago I had to work with some portable > devices for a company specialized in ARM systems. As an HDD, they > were using a Toshiba MK6006GAH. If I remember well, this device had > no NCQ. Instead of the improvements that we obtained by using bfq > with this slow device, removing the differentiated behavior of bfq > with respect to queued/!queued devices would have caused just a loss > of throughput. Heh, that's 60GB ATA-100 hard drive. Had no idea those are still being produced. However, my point still is that the design should be focused on queued devices. They're predominant in the market and it'll only continue to become more so. What bothers me is that the scheduler essentially loses control and shows sub-optimal behavior on queued devices by default and that's how it's gonna perform in vast majority of the use cases. > > I don't know what > > the solution is but given that the benefits of NCQ for rotational > > devices is extremely limited, sticking with single request model in > > most cases and maybe allowing queued operation for specific workloads > > might be a better approach. As for ssds, just do something simple. > > It's highly likely that most ssds won't travel this code path in the > > near future anyway. > > This is the point that worries me mostly. As I pointed out in one of my previous emails, dispatching requests to an SSD without control causes high latencies, or even complete unresponsiveness (Figure 8 in > http://algogroup.unimore.it/people/paolo/disk_sched/extra_results.php > or Figure 9 in > http://algogroup.unimore.it/people/paolo/disk_sched/results.php). > > I am of course aware that efficiency is a critical issue with fast > devices, and is probably destined to become more and more critical > in the future. But, as a user, I would be definitely unhappy with a > system that can, e.g., update itself in one minute instead of five, > but, during that minute may become unresponsive. In particular, I > would not be pleased to buy a more expensive SSD and get a much less > responsive system than that I had with a cheaper HDD and bfq fully > working. blk-mq is right around the corner and newer devices won't travel this path at all. Hopefully, ahci too will be served through blk-mq too when it's connected to ssds, so its usefulness for high performance devices will diminsh rather quickly over the coming several years. It sure would be nice to still be able to carry some optimizations but it does shift the trade-off balance in terms of how much extra complexity is justified. Thanks. -- tejun _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers