Hi James,
Zitat von "A. James Lewis" <james@xxxxxxxxxx>:
OK, but in that case bcache is not between your MD RAID and it's
disks, so if your disks are dropping out of the MD array, that has
to be either an independent problem, or a very complex bug.
My guess is that it's a rather simple timeout / locking problem, which
leads to an expiring timer in the MD code. And bcache has a well-known
history for locking problems, according to the mailing list.
Regards,
Jens
James
On 07/08/15 16:36, Jens-U. Mozdzen wrote:
Hi James,
Zitat von "A. James Lewis" <james@xxxxxxxxxx>:
That's interesting, are you putting your MD on top of multiple
bcache devices... rather than bcache on top of an MD device... I
wonder what the rationale behind this is?
Hi James, no such thing here...
bcache is running on top of two MD-RAIDs - RAID6 with 7 spinning
drives and RAID1 with two SSDs.
The stack is, from bottom to top:
- MD-RAID6 data, MD-RAID1 cache
- bcache (/dev/bcache0, used as an LVM PV)
- LVM
- many LVs
- DRBD on top of most of the LVs
- Ext4 on each of the DRBD devices
- SCST / NFS / SMB sharing these file systems
In the referenced incidents, SCST reports that (many) writes failed
due to time-out, and MD reports a single disk faulty. No other
traces in syslog, especially no stalled processes, locking problems
or kernel bugs.
The i/o pattern is highly parallel reads and writes, mostly via SCST.
Regards,
Jens
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html