Ok, thanks. Xiao Ni <xni@xxxxxxxxxx> 于2019年1月31日周四 下午2:25写道: > > Yes, the failfast is used to fix the problem you described. It can't remove > > the active disk until all pending I/O finish without failfast. If there > is no > > pending I/O, it can be removed immediately. > > Thanks > > Xiao > > > On 01/30/2019 10:14 PM, 李春 wrote: > > I have read the description of the failfast feature. According to the > > phenomenon, it may be not the problem of failfast. > > Because when there are no io pressure, after stop the disk export on > > the storage node, the disk will be automatically eliminate from the > > md disk. > > However, if there is continuous IO pressure, the disk will not be > > automatically removed, and the disk will be eliminated immediately > > after the IO pressure is stopped. > > > > Xiao Ni <xni@xxxxxxxxxx> 于2019年1月30日周三 下午5:15写道: > >> > >> > >> On 01/30/2019 03:25 PM, Jack Wang wrote: > >>> 李春 <pickup112@xxxxxxxxx> 于2019年1月30日周三 上午7:08写道: > >>>> # Description of problem: > >>>> We loaded a disk from two network of storage node via iscsi, merged > >>>> into a disk through multipath, and made a raid1 with local disk by > >>>> mdadm. > >>>> However, when the storage machine of iscsi disk rebooted, raid1 disk > >>>> does not automatically eject the abnormal disk when there are some IO > >>>> pressure. > >>>> > >>>> # Version-Release number of selected component (if applicable): > >>>> vermagic: 2.6.32-573.el6.x86_64 SMP mod_unload modversions > >>>> srcversion: 39AAB97325332236F2FFCA9 > >>>> > >>>> # How reproducible: > >>>> always > >>>> > >>>> # Steps to Reproduce: > >>>> 1. export a disk from storage node > >>>> 2. load the disk on another node and merge it with multipath > >>>> 3. assemble a local disk and the multipath by madm to a raid1 disk > >>>> 4. reboot > >>>> > >>>> # Actual results: > >>>> * multipath disk not eject from raid1 disk under Fio pressure > >>>> * multipath disk eject immediately from raid1 disk when stop Fio pressure > >>>> > >>>> # Expected results: > >>>> * multipath disk eject immediately from raid1 disk under Fio pressure > >>>> > >>>> # Additional info: > >>>> We have done the following tests: > >>>> * In rhel6.7 with kernel of 2.6.32-573.el6.x86_64 test, mdadm's raid1 > >>>> will eliminate the abnormal disk after 5 seconds without IO pressure > >>>> * In rhel6.7 with kernel of 2.6.32-573.el6.x86_64 test, in the case of > >>>> IO pressure, mdadm's raid1 will not reject the abnormal disk, until > >>>> the IO pressure stops, the disk will be removed. > >>>> * In rhel7.4 with kernel of 3.10.0-693.el7.x86_64 test, mdadm's raid1 > >>>> will eliminate the abnormal disk after 5 seconds without IO pressure > >>>> * In rhel7.4 with kernel of 3.10.0-693.el7.x86_64 test, mdadm's raid1 > >>>> will eliminate abnormal disk after 5 seconds under IO pressure > >>>> > >>>> Thanks for your help. > >>> Sounds like, you want failfast feature in upstream, not sure if RH > >>> backport it into their kernel. > >> Thanks for the reporting and analysis. > >> rhel6 is in the period that it's recommended to fix bugs only. So it > >> doesn't backport some features. > >> I'll have a try to backport this to rhel6. > >> > >> Regards > >> Xiao > > > > > -- 李春 Pickup Li 产品研发部 首席架构师 www.woqutech.com 杭州沃趣科技股份有限公司 杭州市滨江区滨安路1190号智汇中心A座1004室 310052 Hangzhou WOQU Technology Co., Ltd. Room 1004, Building A, D-innovation Center, No. 1190, Bin' an road, Hangzhou 310052 T:(0571) 87770835 M:(86)18989451982 F:(0571) 86805750 E:pickup.li@xxxxxxxxxxxx