On 28/10/16 12:59, Kent Overstreet wrote:
On Wed, Oct 26, 2016 at 04:20:38PM +0100, James Pharaoh wrote:
Since I want my bcache device to essentially be a "journal", and to close
the RAID5 write hole, I would prefer to disable this behaviour.
I propose, therefore, a further write mode, in which data is always written
to the cache first, and synced, before it is written to the underlying
device. This could be called "journal" perhaps, or something similar.
I am optimistic that this would be a relatively small change to the code,
since it only requires to always choose the cache to write data to first.
Perhaps the sync behaviour is also more complex, I am not familiar with the
internals.
So, does anyone have any idea if this is practical, if it would genuinely
close the write hole, or any other thoughts?
It's not a crazy idea - bcache already has some stripe awareness code that could
be used as a starting point.
The main thing you'd need to do is ensure that
- all writes are writeback, not writethrough (as you noted)
- when the writeback thread is flushing dirty data, only flush entire stripes -
reading more data into the cache if necessary and marking it dirty, then
ensure that the entire stripe is marked dirty until the entire stripe is
flushed.
This would basically be using bcache to do full data journalling.
I'm not going to do the work myself - I'd rather spend my time working on adding
erasure coding to bcachefs - but I could help out if you or someone else wanted
to work on adding this to bcache.
I don't expect anyone to do the work, or to do this mysekf, although if
I have the funds, and I may do soon, I would be prepared to pay someone
to do it.
At the moment, I'm trying to check my facts/assumptions while designing
a complex system which won't be fully operational for a while. I'd like
to be sure that it is genuinely scalable, as in the design is valid,
before I continue working in this way.
For what it's worth, I have recently set up a lot of this, taking
advantage of extremely cheap servers set up in a "novel" way, and the
performance is pretty good. As I've mentioned, I would like to write up
what I've done, why, and perhaps create an open source management suite
for people to repeat it.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html