Hi When we make IO stress test on multipath device, there will be a metadata err because of wrong path. There are three test scripts. First: #!/bin/bash disk_list="/dev/mapper/3600140531f063b3e19349bc82028e0cc /dev/mapper/36001405ca5165367d67447ea68108e1d /dev/mapper/3600140584e11eb1818c4afab12c17800 /dev/mapper/36001405b7679bd96b094bccbf971bc90" for disk in ${disk_list} do mkfs.ext4 -F $disk done while true do for disk in ${disk_list} do test_dir=${disk##*/} [ -d $test_dir ] && umount $test_dir || mkdir $test_dir while true do mount -o data_err=abort,errors=remount-ro $disk $test_dir && break sleep 0.1 done nohup fsstress -d $(pwd)/$test_dir -l 10 -n 1000 -p 10 -X &>/dev/null & done sleep 5 while [ -n "`pidof fsstress`" ] do sleep 1 done done Second: #!/bin/bash while true do sleep 15 i=0 while [ $i -le 5 ] do iscsiadm -m node -p 100.1.1.1 -u iscsiadm -m node -p 100.1.1.1 -l sleep 1 iscsiadm -m node -p 100.1.2.1 -u iscsiadm -m node -p 100.1.2.1 -l sleep 1 ((i=i+1)) done done Third: #!/bin/bash function iscsi_query() { interval=5 while true do iscsiadm -m node -p 100.1.1.1 &> /dev/null iscsiadm -m node -p 100.1.2.1 &> /dev/null iscsiadm -m session &> /dev/null rescan-scsi-bus.sh &> /dev/null sleep $interval done } function multipath_query() { interval=1 while true do multipath -F &> /dev/null multipath -r &> /dev/null multipath -v2 &> /dev/null multipath -ll &> /dev/null sleep $interval done } function multipathd_query() { disk_base=63 # sdc interval=1 while true do multipathd show paths &> /dev/null multipathd show status &> /dev/null multipathd show daemon &> /dev/null multipathd show maps json &> /dev/null multipathd show config &> /dev/null multipathd show config local &> /dev/null multipathd show blacklist &> /dev/null multipathd show devices &> /dev/null multipathd reset maps stats &> /dev/null multipathd disablequeueing maps &> /dev/null multipathd restorequeueing maps &> /dev/null multipathd forcequeueing daemon &> /dev/null multipathd restorequeueing daemon &> /dev/null let disk_num=disk_base+RANDOM%8 disk=sd`echo "$disk_num" | xxd -p -r` multipathd show path $disk &> /dev/null multipathd del path $disk &> /dev/null multipathd add path $disk &> /dev/null multipathd fail path $disk &> /dev/null multipathd reinstate path $disk &> /dev/null multipathd show path $disk &> /dev/null map_count=`multipathd show maps | grep -v name | wc -l` if [ $map_count -ge 1 ];then let map_num=(RANDOM%map_count)+1 map=`multipathd show maps | grep -v name | awk '{print $1}' | sed -n "$map_num"p` multipathd show map $map &> /dev/null multipathd suspend map $map &> /dev/null multipathd resume map $map &> /dev/null multipathd reload map $map &> /dev/null multipathd reset map $map &> /dev/null fi sleep $interval done } iscsi_query & iscsi_query & multipath_query & multipath_query & multipathd_query & multipathd_query & After the test scripts are executed for some time (about 24h), there will a metadata error. The reason is that multipath device has wrong path. The detail of the first scene: ip1: node disk minor 4:0:0:0: [sdd] 48 4:0:0:1: [sdm] 192 4:0:0:2: [sdk] 160 4:0:0:3: [sdi] 128 ip2: node disk minor 5:0:0:0: [sdc] 32 5:0:0:1: [sdj] 144 5:0:0:2: [sdg] 96 5:0:0:3: [sde] 64 Sequence of events: (1)multipath -r, ip1 logout at same The load table params of 36001405ca5165367d67447ea68108e1d is "0 1 alua 1 1 service-time 0 1 1 8:128 1"(The reason no 128 may be not long after ip2 login and path_discovery doesn't find sde). However, domap failed because ip1 logout. The path of sdi is still in gvecs->pathvec. (2) multipathd add path sde The load table params of 36001405ca5165367d67447ea68108e1d is "0 1 alua 2 1 service-time 0 1 1 8:64 1 service-time 0 1 1 8:128 " and domap successes. At this time, 36001405ca5165367d67447ea68108e1d has two path (sde, sdi), but sdi is actually the path of 36001405b7679bd96b094bccbf971bc90. (3) metadata of 36001405ca5165367d67447ea68108e1d sync The metadata of 36001405b7679bd96b094bccbf971bc90 will be covered. (4) umount 36001405b7679bd96b094bccbf971bc90 36001405b7679bd96b094bccbf971bc90 has no usable path when umount, so the correct metadata doesn't sync. (5) mount 36001405b7679bd96b094bccbf971bc90 Failed because of err metadata I think there may be other ways to lead metadata err too. I have no good idea to deal this. Can you give a great advice about this. Thanks very much. Regards, Lixiaokeng -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel