On 2019/10/28 11:07 上午, Coly Li wrote: > On 2019/10/24 10:40 上午, Larkin Lowrey wrote: >> I have a backing device that is constantly writing due to >> bcache_writebac. It has been at 14.3.MB dirty all day and has not >> changed. There's nothing else writing to it. >> >> This started after I "upgraded" from Fedora 29 to 30 and consequently >> from kernel 5.2.18 to 5.3.6. >> >> You can see from the info below that the writeback process is chewing >> up A LOT of CPU and writing constantly at ~7MB/s. It sure looks like >> it's in an infinite loop and writing the same data over and over. At >> least I hope that's the case and it's not just filling the array with >> garbage. >> >> This configuration has been stable for many years and across many Fedora >> upgrades. The host has ECC memory so RAM corruption should not be a >> concern. I have not had any recent controller or drive failures. >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 5915 root 20 0 0 0 0 R 94.1 0.0 891:45.42 >> bcache_writebac >> >> >> Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s >> %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util >> md3 0.02 1478.77 0.00 6.90 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 7.72 4.78 0.00 0.00 >> md3 0.00 1600.00 0.00 7.48 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 4.79 0.00 0.00 >> md3 0.00 1300.00 0.00 6.07 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 4.78 0.00 0.00 >> md3 0.00 1500.00 0.00 7.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 0.00 0.00 4.78 0.00 0.00 >> >> --- bcache --- >> UUID dc2877bc-d1b3-43fa-9f15-cad018e73bf6 >> Block Size 512 B >> Bucket Size 512.00 KiB >> Congested? False >> Read Congestion 2.0ms >> Write Congestion 20.0ms >> Total Cache Size 128 GiB >> Total Cache Used 128 GiB (100%) >> Total Cache Unused 0 B (0%) >> Evictable Cache 127 GiB (99%) >> Replacement Policy [lru] fifo random >> Cache Mode writethrough [writeback] writearound none >> Total Hits 49872 (97%) >> Total Misses 1291 >> Total Bypass Hits 659 (77%) >> Total Bypass Misses 189 >> Total Bypassed 5.8 MiB >> --- Cache Device --- >> Device File /dev/dm-3 (253:3) >> Size 128 GiB >> Block Size 512 B >> Bucket Size 512.00 KiB >> Replacement Policy [lru] fifo random >> Discard? False >> I/O Errors 0 >> Metadata Written 5.0 MiB >> Data Written 86.1 MiB >> Buckets 262144 >> Cache Used 128 GiB (100%) >> Cache Unused 0 B (0%) >> --- Backing Device --- >> Device File /dev/md3 (9:3) >> bcache Device File /dev/bcache0 (252:0) >> Size 73 TiB >> Cache Mode writethrough [writeback] writearound none >> Readahead 0.0k >> Sequential Cutoff 4.0 MiB >> Merge sequential? False >> State dirty >> Writeback? True >> Dirty Data 14.3 MiB >> Total Hits 49872 (97%) >> Total Misses 1291 >> Total Bypass Hits 659 (77%) >> Total Bypass Misses 189 >> Total Bypassed 5.8 MiB >> >> I have not tried reverting back to an earlier kernel. I'm concerned >> about possible corruption. Is that safe? Any other suggestions as to how >> to debug and/or resolve this issue? > > From 5.2 to 5.3, there is only a few change related to writeback, and I > don't find obviously suspicious location for the infinite writback loop. > > Since in 5.3 there are many issue fixed, it might be possible that > another problem shows up because the previous problem fixed. > > Do you see anything suspicious in kernel message log ? or is it possible > to tar up and compress the /sys/fs/bcache/<cache-set-uuid> and > /sys/block/bcache0/ directories and emailed them to me ? > > And if you may run perf on the writback thread to sample the hot > location in the infinite loop, maybe I can find some clue if lucky. > Hi Larkin, Thank you for sending me the sysfs data via email. From the sysfs data, there is one thing suspicious, see two files under sys/fs/bcache/dc2877bc-d1b3-43fa-9f15-cad018e73bf6/internal/ , - writeback_keys_done: 31324 - writeback_keys_failed: 600603834 Counter writeback_keys_failed is abnormal large, it seems writeback always fails and not accomplished. There are three conditions that the writeback_keys_failed counter might be increased, 1) Read dirty data from cache device failed. 2) Read dirty data from cache device success, but write into backing device failed. 3) Read dirty data from cache device successes, and write it into backing device successes too, but update the dirty key into clean key in B+tree failed. For condition 1) and 3), it might be related to SSD healthy states. For condition 2), it might be related to backing device healthy states. But from /sys/block/md3/bcache/io_errors, I see the backing device IO error counter is 0, therefore it is very probably not a problem from backing device. Because the dirty bit of bkeys are not cleaned, so the writeback thread always works in an infinite loop. Could you please to check the s.m.a.r.t status of the SSD ? Maybe the I/O error is from a problematic storage media of SSD ? I cannot be 100% sure this is a SSD physical problem, because it might be caused by a hidden bug in prev-linux-5.3 bcache code and causes a corrupted B+tree node and makes bkey update (from dirty to clean) to fail. But the SSD healthy is more suspicious IMHO at this moment. Thanks. -- Coly Li