On 13 Feb 2019, Kai Krakow verbalised: > Am Mi., 13. Feb. 2019 um 01:22 Uhr schrieb Nix <nix@xxxxxxxxxxxxx>: >> > Here's my branch: >> > https://github.com/kakra/linux/compare/master...kakra:rebase-4.20/bcache-updates >> >> Looks to be fixed there. Maybe you found a later version of the patches >> than I did :) I derived mine from ewheelerinc's >> for-4.10-block-bcache-updates, but even >> bcache-updates-linux-block-for-4.13 seems to have the same bug, as does >> bcache-updates-linux-block-for-next. >> >> Which branch did you rebase from? Maybe I should respin from the same >> one (or probably just use your branch :) ). > > I used the same base but I'm carrying around those patches since then, > rebased through several kernel versions. I think Eric also jumped in > once a commented on some corrections that should be made. I just > followed what I was reading. > > Feel free to use that branch, it also has some fixes that are queued for 5.1. I will probably switch... >> > There's still a problem with bcache doing writebacks very very slowly, >> > at only 4k/s. My system generates more than 4k/s writes thus it will >> > eventually never finish writing back dirty data. >> >> That seems... very bad. > > It can be. It has downsides: On a busy system, writeback should kick > in only when idle to not delay read IO. Yeah, except if you're emitting huge quantities of I/O (vapoursynth video processing, I'm looking at you: only thing I've ever done that emits a terabyte of data at once, then reads it straight back in again: things like *that*, and copious object files etc from builds, are why I have an uncached, unjournalled RAID-0 ext4 fs on the fastest 250GiB of each disk in the array, bind-mounted in as needed for transient fs operations: if fsck finds problems on it it just gets automatically remkfsed.) > For optimally ordered IOs I've seen 800 MB/s here, but usually it > peaks at around 60-80 MB/s for writes when doing Steam downloads (tho It definitely sounds like your writes ultimately come from the Internet mostly: mine are, ah, endogenously generated (compiles, massive text files with awk or readelf output being chewed over by scripts, vapoursynth video transcodes, etc) so can be almost arbitrarily huge and fast. > Setup: bcache 400 GB SSD + 4x HDD btrfs RAID-0. Mine is a 350GiB of SSD devoted to this (and a bunch of the same SSD devoted to other stuff), and 6x 8TiB in a RAID-6, about 2/3rds of which is bcached (but that is still mostly empty, because, well, even after the RAID-6 overhead there's 14TiB of it!) >> Mine is still only 8GiB used out of 340. I think I might boost the >> bypass figures -- perhaps setting it identical to the RAID stripe size >> was a bad idea? (Though I thought there was a preference for full-stripe >> *writes*, not reads, even if XFS does know about the RAID topology.) > > I'm not sure if XFS could really discover the lower-layers topology > through bcache... Indeed it can't. So you have to tell it at mkfs time: mkfs.xfs -m rmapbt=1,reflink=1 \ -d agcount=17,sunit=$((128*8)),swidth=$((384*8)) \ -l logdev=/dev/sde3,size=521728b \ -i sparse=1,maxpct=25 /dev/main/root There are very few reasons to hand-specify parameters to mkfs.xfs these days, unless you want to flip on experimental features or something (as rmapbt and reflinks were when I did this). agcount/sunit/swidth on RAID arrays where it can't see the topology because bcache is in the way is one of them. (External journals are another. *Obviously*, that '521728b' is the same as saying '2G'. But it's certain to be block-accurate which I sort of cared about here. I was working in units of 4KiB fs blocks/HDD sectors the whole time.)