On 5/8/19 6:29 AM, Wols Lists wrote:
On 06/05/19 22:07, Song Liu wrote:
Could you please run a quick test with raid5? I am wondering whether
some race condition could get us into similar crash. If we cannot easily
trigger the bug, we can process with this version.
Bear in mind I just read the list and write documentation, but ...
My gut feeling is that if it can theoretically happen for all raid
modes, it should be fixed for all raid modes. What happens if code
changes elsewhere and suddenly it really does happen for say raid-5?
On the other hand, if fixing it in md.c only gets tested for raid-0, how
do we know it will actually work for other raids if they do suddenly
start falling through.
Hi, I understand your concern. But all other raid levels contains
failure-event mechanisms. For example, in all my tests with raid5 or
raid1, it first complained the device was removed, then it failed in
super_written() when no other available device was present.
On the other hand, raid0 does "blind-writes": it just selects the device
in which that bio should be written (given the stripe math) and change
the bio's device, sending it back via generic_make_request(). It's
dummy, but not in a bad way, but rather for performance reasons. It has
no "intelligence" for failures, as all other raid levels.
That said, we could fix md.c for all raid levels, but I personally think
it's a bazooka shot, only raid0 shows consistently this issue.
Academic purity versus engineering practicality :-)
Heheh you have good points here! Thanks for the input =)
Cheers,
Guilherme
Cheers,
Wol
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel