Re: [PATCH] bcache: recover data from backing device when read request hit clean

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/11/17 13:22, Coly Li wrote:
On 17/11/2017 8:57 PM, Eddie Chapman wrote:
On 17/11/17 10:20, Rui Hua wrote:
Hi, Stefan

2017-11-17 16:28 GMT+08:00 Stefan Priebe - Profihost AG
<s.priebe@xxxxxxxxxxxx>:
I‘m getting the same xfs error message under high load. Does this
patch fix
it?

Did you applied the patch "bcache: only permit to recovery read error
when cache device is clean" ?
If you did, maybe this patch can fix it. And you'd better check
/sys/fs/bcache/XXX/internal/cache_read_races in your environment,
meanwhile, it should not be zero when you get that err message.

Hi all,

I have 3 servers running a very recent 4.9 stable release, with several
recent bcache patches cherry picked, including V4 of "bcache: only
permit to recovery read error when cache device is clean".

In the 3 weeks since using these cherry picks I've experienced a very
small number of isolated read errors in the layer above bcache, on all 3
servers.

On one of the servers, 2 out of the 6 bcache resources have a value of 1
in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on these same
2 bcache resources where one read error has occurred on the upper layer.
The other 4 bcache resources have 0 in cache_read_races and I haven't
had any read errors on the layers above them.

On another server, I have 1 bcache resource out of 10 with a value of 5
in /sys/fs/bcache/XXX/internal/cache_read_races, and it is on that
bcache resource where a read error occurred on one occasion. The other 9
bcache resources have 0 in cache_read_races, and no read errors have
occurred on the layers above any of them.

On the 3rd server where some read errors occurred, I cannot verify if
there were positive values in cache_read_races as I moved the data from
there onto other storage, and shut down the bcache resources where the
errors occurred.

If I can provide any other info which might help with this issue, please
let me know.

Hi Eddie,

This is very informative, thank you so much :-)

Coly Li

Hi Coly,

You are most welcome. Another interesting info, but maybe it is unrelated/coincidence: the bcache resources where the errors occurred, the underlying backing device was a raid adapter that is quite a lot slower than the (different) underlying physical storage on the other bcache resources that do not have read races. Up to now I had suspected a driver issue with this raid adapter as causing the read errors, so I started the process of gradually retiring the adapter on these servers in the last 3 weeks. Anyway, in light of this issue coming up here I'm wondering if this is significant in suggesting possibly that the read races are more likely to occur if the backing storage is quite slow. Or maybe not.

Eddie



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux