Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I've noticed similar oddities with our servers - not quite the same, but close enough so I won't open a new thread:

We're running kernel 3.18.8 on server machines delivering SAN and NAS resources. Back-end storage is a MD-RAID6 (7 WD Red 1TB 2,5"), cache is on MD-RAID1 (2 SSD TOSHIBA PX02SMF020 200GB). As we live migrated from 128GB SSDs to the Toshibas, the cache size is still at 128 GB.

On one of the servers, I noticed excessive I/O load reported by our monitoring tools. Having read this thread, I tried to get the amount of dirty data down (cache_mode set to "writeback", writeback_percent to 0, writeback_rate to 10000 and then monitoring writeback_rate_debug), but unlike with the other server, the amount of dirty data would not go below 186M. Unlike with the original report, bcache_writeback wasn't at 100% but varying in its CPU usage - but always on top of all other processes running. I/O wait was unusually high, compared to the amount of data written.

I left the system rest over night, to find that the next day it would not go below 197M, so that "bad spot" had changed. The load on this server had increased - looking at the stats, it seemed like writeback was trying to write data all the time, but for whatever reason failing (which matched the lower limit of dirty data).

Fearing the worst, I set the cache mode to writearound to disable further caching (amount of dirty still wouldn't drop below its border value), stopped the clients for this server and rebooted.

Luckily, the server came up without a problem, and *I now could get the amount of dirty data down to zero*. I switched back to writeback, with writeback_percent to 0 and a fixed writeback_rate.

So in our case, it looks like something borked in bcache's run-time, rather than on-disk (read: SSD cache content).

Regards,
Jens

PS: We're still facing random reboots (of unknown cause), which may correlate with bcache's "amount dirty" being near the limit set by writeback_percent. I'm trying to work-around this these days by running with writeback_percent set to zero and using a writeback_rate that lets the cache clean up over the day... so far, so good, but it's too early to tell for sure. Since switching to the new SSDs, the reboot rate went down to about once a week and I've made this change only two days ago.


--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux