On Tue, 6 Oct 2020, Nix wrote: > On 6 Oct 2020, Kai Krakow verbalised: > > > Am Di., 6. Okt. 2020 um 14:28 Uhr schrieb Nix <nix@xxxxxxxxxxxxx>: > >> That sounds like a bug in the mq-scsi machinery: it surely should be > >> passing the ioprio off to the worker thread so that the worker thread > >> can reliably mimic the behaviour of the thread it's acting on behalf of. > > > > Maybe this was only an issue early in mq-scsi before it got more > > schedulers than just iosched-none? It has bfq now, and it should work. > > Depending on the filesystem, tho, that may still not fully apply... > > e.g. btrfs doesn't use ioprio for delayed refs resulting from such io, > > it will simply queue it up at the top of the io queue. > > Yeah. FWIW I'm using bfq for all the underlying devices and everything > still seems to be working, idle I/O doesn't get bcached etc. > > >> using cgroups would make this essentially unusable for > >> me, and probably for most other people, because on a systemd system the > >> cgroup hierarchy is more or less owned in fee simple by systemd, and it > >> won't let you use cgroups for something else, > > > > That's probably not completely true, you can still define slices which > > act as a cgroup container for all services and processes contained in > > it, and you can use "systemctl edit myscope.slice" to change > > scheduler, memory accounting, and IO params at runtime. > > That's... a lot clunkier than being able to say 'ionice -c 3 foo' to run > foo without caching. root has to prepare for it on a piece-by-piece > basis... not that ionice is the most pleasant of utilities to use > either. I always make my own cgroups with cgcreate, cgset, and cgexec. We're using centos7 which is all systemd and I've never had a problem: Something (hypothetically) like this: cgcreate -g blkio:/my_bcache_settings cgset -r blkio.bcache.bypass='read,write' my_bcache_settings cgset -r blkio.bcache.writeback='write,meta' my_bcache_settings Then all you need to do is run this which isn't all that different from an ionice invocation: cgexec -g blkio:my_bcache_settings /usr/local/bin/some-program -- Eric Wheeler > > >> (And as for making systemd set up suitable cgroups, that too would make > >> it unusable for me: I tend to run jobs ad-hoc with ionice, use ionice in > >> scripts etc to reduce caching when I know it won't be needed, and that > >> sort of thing is just not mature enough to be reliable in systemd yet. > > > > You can still define a slice for such ad-hoc processes by using > > systemd-run to make your process into a transient one-shot service. > > That's one of the things that crashed my system when I tried it. I just > tried it again and it seems to work now. :) (Hm, does systemd-run wait > for return and hand back the exit code... yes, via --scope or --wait, > both of which seem to have elaborate constraints that I don't fully > understand and that makes me rather worried that using them might not be > reliable: but in this it is just like almost everything else in > systemd.) > > >> It's rare for a systemd --user invocation to get everything so confused > >> that the entire system is reundered unusable, but it has happened to me > >> in the past, so unlike ionice I am now damn wary of using systemd --user > >> invocations for anything. They're a hell of a lot clunkier for ad-hoc > >> use than a simple ionice, too: you can't just say "run this command in a > >> --user", you have to set up a .service file etc.) > > > > Not sure what you did, I never experienced that. Usually that happens > > It was early in the development of --user, so it may well have been a > bug that was fixed later on. In general I have found systemd to be too > tightly coupled and complex to be reliable: there seem to be all sorts > of ways to use local mounts and fs namespaces and the like to fubar PID > 1 and force a reboot (which you can't do because PID 1 is too unhappy, > so it's /sbin/reboot -f time). Admittedly I do often do rather extreme > things with tens of thousands of mounts and the like, but y'know the > only thing that makes unhappy is... systemd. :/ > > (I have used systemd enough to both rely on it and cordially loathe it > as an immensely overcomplicated monster with far too many edge cases and > far too much propensity to insist on your managing the system its way > (e.g. what it does with cgroups), and if I do anything but the simplest > stuff I'm likely to trip over one or more bugs in those edge cases. I'd > switch to something else simple enough to understand if only all the > things I might switch to were not also too simple to be able to do the > things I want to do. The usual software engineering dilemma...) > > In general, though, the problem with cgroups is that courtesy of v2 > having a unified hierarchy, if any one thing uses cgroups, nothing else > really can, because they all have to agree on the shape of the > hierarchy, which is most unlikely if they're using cgroups for different > purposes. So it is probably a mistake to use cgroups for *anything* > other than handing control of it to a single central thing (like > systemd) and then trying to forget that cgroups ever existed for any > other purpose because you'll never be able to use them yourself. > > A shame. They could have been a powerful abstraction... > > > and some more. The trick is to define all slices with a > > lower bound of memory below which the kernel won't reclaim memory from > > it - I found that's one of the most important knobs to fight laggy > > desktop usage. > > I cheated and just got a desktop with 16GiB RAM and no moving parts and > a server with so much RAM that it never swaps, and 10GbE between the two > so the desktop can get stuff off the server as fast as its disks can do > contiguous reads. bcace cuts down seek time enough that I hardly ever > have to wait for it, and bingo :) > > (But my approach is probably overkill: yours is more elegant.) > > > I usually look at the memory needed by the processes when running, > > I've not bothered with that for years: 16GiB seems to be enough that > Chrome plus even a fairly big desktop doesn't cause the remotest > shortage of memory, and the server, well, I can run multiple Emacsen and > 20+ VMs on that without touching the sides. (Also... how do you look at > it? PSS is pretty good, but other than ps_mem almost nothing uses it, > not even the insanely overdesigned procps top.) >