Yes, the failfast is used to fix the problem you described. It can't remove
the active disk until all pending I/O finish without failfast. If there
is no
pending I/O, it can be removed immediately.
Thanks
Xiao
On 01/30/2019 10:14 PM, 李春 wrote:
I have read the description of the failfast feature. According to the
phenomenon, it may be not the problem of failfast.
Because when there are no io pressure, after stop the disk export on
the storage node, the disk will be automatically eliminate from the
md disk.
However, if there is continuous IO pressure, the disk will not be
automatically removed, and the disk will be eliminated immediately
after the IO pressure is stopped.
Xiao Ni <xni@xxxxxxxxxx> 于2019年1月30日周三 下午5:15写道:
On 01/30/2019 03:25 PM, Jack Wang wrote:
李春 <pickup112@xxxxxxxxx> 于2019年1月30日周三 上午7:08写道:
# Description of problem:
We loaded a disk from two network of storage node via iscsi, merged
into a disk through multipath, and made a raid1 with local disk by
mdadm.
However, when the storage machine of iscsi disk rebooted, raid1 disk
does not automatically eject the abnormal disk when there are some IO
pressure.
# Version-Release number of selected component (if applicable):
vermagic: 2.6.32-573.el6.x86_64 SMP mod_unload modversions
srcversion: 39AAB97325332236F2FFCA9
# How reproducible:
always
# Steps to Reproduce:
1. export a disk from storage node
2. load the disk on another node and merge it with multipath
3. assemble a local disk and the multipath by madm to a raid1 disk
4. reboot
# Actual results:
* multipath disk not eject from raid1 disk under Fio pressure
* multipath disk eject immediately from raid1 disk when stop Fio pressure
# Expected results:
* multipath disk eject immediately from raid1 disk under Fio pressure
# Additional info:
We have done the following tests:
* In rhel6.7 with kernel of 2.6.32-573.el6.x86_64 test, mdadm's raid1
will eliminate the abnormal disk after 5 seconds without IO pressure
* In rhel6.7 with kernel of 2.6.32-573.el6.x86_64 test, in the case of
IO pressure, mdadm's raid1 will not reject the abnormal disk, until
the IO pressure stops, the disk will be removed.
* In rhel7.4 with kernel of 3.10.0-693.el7.x86_64 test, mdadm's raid1
will eliminate the abnormal disk after 5 seconds without IO pressure
* In rhel7.4 with kernel of 3.10.0-693.el7.x86_64 test, mdadm's raid1
will eliminate abnormal disk after 5 seconds under IO pressure
Thanks for your help.
Sounds like, you want failfast feature in upstream, not sure if RH
backport it into their kernel.
Thanks for the reporting and analysis.
rhel6 is in the period that it's recommended to fix bugs only. So it
doesn't backport some features.
I'll have a try to backport this to rhel6.
Regards
Xiao