On Mon, Jun 29, 2020 at 9:45 PM Tom Seewald <tseewald@xxxxxxxxx> wrote: > > > The latter but considering they're a broad variety of workloads I > > think it's misleading to call them server workloads as if that's one > > particular type of thing, or not applicable to a desktop under IO > > pressure. Why? (a) they're using consumer storage devices (b) these > > are real workloads rather than simulations (c) even by upstream's own > > descriptions of the various IO schedulers only mq-deadline is intended > > to be generic. (d) it's really hard to prove anything in this area > > without a lot of data. > > You are right that the difference between them is blurry. My question comes from being unsure if it's the case that Fedora users are experiencing problems with bfq but are not reporting them, or if there is something specific that is causing that pathological scheduling behavior at Facebook. They're using mq-deadline most everywhere, not just the servers, but local computers and VMs. They use kyber (which is Facebook contributed) for high end storage, and it's not indicated for our usage. I'm not sure they're seeing anything wrong per se with bfq, it's just consistently not performing as well as mq-deadline due to latencies. I'm not sure that's a bug if it's improving performance in other areas that are relevant for the intended workloads. The gotcha is, what are the intended workloads? What is even a desktop workload? >It was also my understanding that Facebook primarily uses NVMe drives [1][2], and that is the class of storage Fedora does not use bfq with. Is it possible these latency problems occurred when using bfq with NVMe drives? Not certain. But in our case we use 'none' for NVMe drives. For most people that's OK, but then some workloads will suffer if you get a task that has a heavy demand for tags, because there's no scheduler to spread them out among those demanding them. So it's pulling a number ouf of my butt, but none could be fine for 90% and not great for 10%. If anything 'none' and NVMe is a server like configuration, if it's running a typically homogenous workload. > I now see that Paolo was cc'd in comment #9 of the bugzilla ticket, so hopefully he responds. > > > But fair enough, I'll see about collecting some data before asking to > > change the IO scheduler yet again. > > For the record, I definitely agree that mq-deadline should become the default scheduler for NVMe drives. The other question I have, I'm pretty sure we're using the same udev rule across all of Fedora. It's not just on the desktops. My Fedora Server is using bfq for everything. VM's are using mq-deadline for /dev/vd* virtio devices and bfq for /dev/sr* and /dev/sd* devies. I have nothing against bfq but I'm inclined to go with the most generic IO scheduler as the default, and let people optimize for their specific workload, rather than the other way around. It's super annoying for me to post, because benchmarks drive me crazy, and yet here I am posting one - this is almost like self flagellation to paste this... https://www.phoronix.com/scan.php?page=article&item=linux-56-nvme&num=4 None of these benchmarks are representative of a generic desktop. The difficulty with desktop workloads is their heterogenetity. Some people are mixing music, others compiling, still others lots of web browsing (Chrome OS I guess went to bfq around the same time we did), and we just don't really know what people are going to do. Some even use Workstation as a base for more typical server operations. The geometric mean isn't helpful either, because none of the tests are run concurrently or attempt to produce tag starvation which would result in latency spikes. That's where mq-deadline would do better than none. -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx