On Mon, 28 Oct 2019, Coly Li wrote: > On 2019/10/24 10:40 上午, Larkin Lowrey wrote: > > I have a backing device that is constantly writing due to > > bcache_writebac. It has been at 14.3.MB dirty all day and has not > > changed. There's nothing else writing to it. > > > > This started after I "upgraded" from Fedora 29 to 30 and consequently > > from kernel 5.2.18 to 5.3.6. > > > > You can see from the info below that the writeback process is chewing > > up A LOT of CPU and writing constantly at ~7MB/s. It sure looks like > > it's in an infinite loop and writing the same data over and over. At > > least I hope that's the case and it's not just filling the array with > > garbage. > > > > This configuration has been stable for many years and across many Fedora > > upgrades. The host has ECC memory so RAM corruption should not be a > > concern. I have not had any recent controller or drive failures. > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 5915 root 20 0 0 0 0 R 94.1 0.0 891:45.42 > > bcache_writebac > > > > > > Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s > > %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util > > md3 0.02 1478.77 0.00 6.90 0.00 0.00 > > 0.00 0.00 0.00 0.00 0.00 7.72 4.78 0.00 0.00 > > md3 0.00 1600.00 0.00 7.48 0.00 0.00 > > 0.00 0.00 0.00 0.00 0.00 0.00 4.79 0.00 0.00 > > md3 0.00 1300.00 0.00 6.07 0.00 0.00 > > 0.00 0.00 0.00 0.00 0.00 0.00 4.78 0.00 0.00 > > md3 0.00 1500.00 0.00 7.00 0.00 0.00 > > 0.00 0.00 0.00 0.00 0.00 0.00 4.78 0.00 0.00 > > > > --- bcache --- > > UUID dc2877bc-d1b3-43fa-9f15-cad018e73bf6 > > Block Size 512 B > > Bucket Size 512.00 KiB > > Congested? False > > Read Congestion 2.0ms > > Write Congestion 20.0ms > > Total Cache Size 128 GiB > > Total Cache Used 128 GiB (100%) > > Total Cache Unused 0 B (0%) > > Evictable Cache 127 GiB (99%) > > Replacement Policy [lru] fifo random > > Cache Mode writethrough [writeback] writearound none > > Total Hits 49872 (97%) > > Total Misses 1291 > > Total Bypass Hits 659 (77%) > > Total Bypass Misses 189 > > Total Bypassed 5.8 MiB > > --- Cache Device --- > > Device File /dev/dm-3 (253:3) > > Size 128 GiB > > Block Size 512 B > > Bucket Size 512.00 KiB > > Replacement Policy [lru] fifo random > > Discard? False > > I/O Errors 0 > > Metadata Written 5.0 MiB > > Data Written 86.1 MiB > > Buckets 262144 > > Cache Used 128 GiB (100%) > > Cache Unused 0 B (0%) > > --- Backing Device --- > > Device File /dev/md3 (9:3) > > bcache Device File /dev/bcache0 (252:0) > > Size 73 TiB > > Cache Mode writethrough [writeback] writearound none > > Readahead 0.0k > > Sequential Cutoff 4.0 MiB > > Merge sequential? False > > State dirty > > Writeback? True > > Dirty Data 14.3 MiB > > Total Hits 49872 (97%) > > Total Misses 1291 > > Total Bypass Hits 659 (77%) > > Total Bypass Misses 189 > > Total Bypassed 5.8 MiB > > > > I have not tried reverting back to an earlier kernel. I'm concerned > > about possible corruption. Is that safe? Any other suggestions as to how > > to debug and/or resolve this issue? > > From 5.2 to 5.3, there is only a few change related to writeback, and I > don't find obviously suspicious location for the infinite writback loop. > > Since in 5.3 there are many issue fixed, it might be possible that > another problem shows up because the previous problem fixed. > > Do you see anything suspicious in kernel message log ? or is it possible > to tar up and compress the /sys/fs/bcache/<cache-set-uuid> and > /sys/block/bcache0/ directories and emailed them to me ? > > And if you may run perf on the writback thread to sample the hot > location in the infinite loop, maybe I can find some clue if lucky. Try cat /proc/PID/stack of the writeback thread, and then see if you can do a gdb trace: gdb> list *(function_pointer_from_stack_above) -Eric > > Thanks. > > -- > > Coly Li >