On Tue, Jun 30, 2020 at 07:28:53PM +0100, Ankur Sinha wrote: > On Tue, Jun 30, 2020 17:23:16 +0000, Zbigniew Jędrzejewski-Szmek wrote: > > On Tue, Jun 30, 2020 at 04:25:23PM +0100, Ankur Sinha wrote: > > > On Mon, Jun 29, 2020 15:01:24 -0600, Chris Murphy wrote: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1851783 > > > > > > > > The main argument is that for typical and varied workloads in Fedora, > > > > mostly on consumer hardware, we should use mq-deadline scheduler > > > > rather than either none or bfq. > > > > > > > > It may be true most folks with NVMe won't see anything bad with none, > > > > but those who have heavier IO workloads are likely to be better off > > > > with mq-deadline. > > > > > > > > Further details are in the bug, but let's discuss it on list. Thanks! > > > > > > There was this thread about our systems hanging, and the workaround was > > > to revert to mq-deadline from bfq: > > > > > > https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/thread/MJJFT5AOYUFZ3SO2EDVLJSDAZMZI4HAP/#DA7RCQFIAD4Z3Q7HQBW2ELPTLPYDKJMT > > > > To clarify: you could reliably reproduce the issue when building steps in mock. > > Did you verify that it is reliably fixed simply by switching bfq→mq-deadline? > > Yes, that was the first change I had made and it had stopped the > hanging. As a permanent fix, though, I switched to using isolation = > simple in mock, and since that works, I've not changed it since. OK, thanks. > (I make it a point to provide the needed information for bugs, but this > release my quota is currently being used up on getting Docker + minikube > to work on F32 for $dayjob) > > > > There are a few threads on AskFedora about systems hanging. They're not > > > the easiest to debug but we did suggest people try switching to > > > mq-deadline to see if it helps: > > > > > > https://ask.fedoraproject.org/t/whole-os-freezes-watching-a-video-with-mpv/6770/10 > > > > > > I don't know enough about this to say if it's a bug and if it has been > > > fixed. > > > > There's a lot of noise in those bug reports. For heisenbugs, the fact > > that something was an issue and after a flurry of half-random changes > > to the system isn't, does not allow us conclude _anything_. We need > > somebody who understands what they are doing to isolate the issue. In > > particular, if this is a kernel hang, than we need a proper traceback > > from the kernel, and not just assume it's the scheduler. > > There is a kernel trace in the related bug that was cited there: > https://bugzilla.redhat.com/show_bug.cgi?id=1767097#c7 > > which links to another bfq bug here that's currently needinfo: > https://bugzilla.redhat.com/show_bug.cgi?id=1767539 > > > (In particular, if this is a race condition, changing the scheduler > > could be just making the condition less likely because the system is > > slower or faster or just schedules processes in a different order, > > without the scheduler being relevant to the bug). > > Like I said, I don't know. I'm a fairly advanced Linux user but you can > hardly me to also be kernel hacker. :) > > For kernel bugs, I'd strongly suggest giving reporters steps by step > instructions or links to using a "serial console" or a "netconsole". > These are not part of my working vocabulary (I cannot speak for others). Thanks for the links. This seems to be a tough cookie and I hope it gets resolved as some point. And to clarify: my comment about debugging was not directed to you in particular, apart from the question above which you have already answered. Zbyszek _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx