RE: [systemd-devel] Errorneous detection of degraded array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've modified a number of settings to try to resolve this, so far no success.
I've created an explicit mount file for the RAID array: /etc/systemd/system/share.mount
Inside there I've experimented with TimeoutSec=

In /etc/systemd/system/mdadm-last-resort@.timer I've worked with
OnActiveSec=

I've also tried (without an explicit mount file) to add x-systemd.device-timeout to /etc/fstab for the mount. 

Here's a few more system logs showing perhaps more detail. I've edited them to show only relevant details, full pastebin of logs: http://pastebin.com/sL8nKt7j
These logs were generated with TimeoutSec=120 in /etc/systemd/system/share.mount the description of the mount in the logs is: "Mount /share RAID partition explicitly"
And OnActiveSec=30 in /etc/systemd/system/mdadm-last-resort@.timer
>From blkid:
/dev/md0: UUID="2b9114be-3d5a-41d7-8d4b-e5047d223129" TYPE="ext4"
/dev/md0 is the /share partition.

>From /etc/mdadm.conf:
ARRAY /dev/md/0  metadata=1.2 UUID=97566d2f:ae7a169b:966f5840:3e8267f9 name=lnxnfs01:0

Boot begins at Jan 27 11:33:10
+4 seconds from boot:
Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device /dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959.
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdc operational as raid disk 2
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdx operational as raid disk 23
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdu operational as raid disk 20
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdt operational as raid disk 19
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdo operational as raid disk 14
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdn operational as raid disk 13
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdd operational as raid disk 3
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdv operational as raid disk 21
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sda operational as raid disk 0
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdf operational as raid disk 5
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdm operational as raid disk 12
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sde operational as raid disk 4
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdp operational as raid disk 15
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdi operational as raid disk 8
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdl operational as raid disk 11
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdk operational as raid disk 10
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sds operational as raid disk 18
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdb operational as raid disk 1
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdj operational as raid disk 9
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdg operational as raid disk 6
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdr operational as raid disk 17
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdh operational as raid disk 7
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdq operational as raid disk 16
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: device sdw operational as raid disk 22
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: allocated 25534kB
Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: raid level 6 active with 24 out of 24 devices, algorithm 2
Jan 27 11:33:14 lnxnfs01 kernel: RAID conf printout:
Jan 27 11:33:14 lnxnfs01 kernel:  --- level:6 rd:24 wd:24
Jan 27 11:33:14 lnxnfs01 kernel:  disk 0, o:1, dev:sda
Jan 27 11:33:14 lnxnfs01 kernel:  disk 1, o:1, dev:sdb
Jan 27 11:33:14 lnxnfs01 kernel:  disk 2, o:1, dev:sdc
Jan 27 11:33:14 lnxnfs01 kernel:  disk 3, o:1, dev:sdd
Jan 27 11:33:14 lnxnfs01 kernel:  disk 4, o:1, dev:sde
Jan 27 11:33:14 lnxnfs01 kernel:  disk 5, o:1, dev:sdf
Jan 27 11:33:14 lnxnfs01 kernel:  disk 6, o:1, dev:sdg
Jan 27 11:33:14 lnxnfs01 kernel:  disk 7, o:1, dev:sdh
Jan 27 11:33:14 lnxnfs01 kernel:  disk 8, o:1, dev:sdi
Jan 27 11:33:14 lnxnfs01 kernel:  disk 9, o:1, dev:sdj
Jan 27 11:33:14 lnxnfs01 kernel:  disk 10, o:1, dev:sdk
Jan 27 11:33:14 lnxnfs01 kernel:  disk 11, o:1, dev:sdl
Jan 27 11:33:14 lnxnfs01 kernel:  disk 12, o:1, dev:sdm
Jan 27 11:33:14 lnxnfs01 kernel:  disk 13, o:1, dev:sdn
Jan 27 11:33:14 lnxnfs01 kernel:  disk 14, o:1, dev:sdo
Jan 27 11:33:14 lnxnfs01 kernel:  disk 15, o:1, dev:sdp
Jan 27 11:33:14 lnxnfs01 kernel:  disk 16, o:1, dev:sdq
Jan 27 11:33:14 lnxnfs01 kernel:  disk 17, o:1, dev:sdr
Jan 27 11:33:14 lnxnfs01 kernel:  disk 18, o:1, dev:sds
Jan 27 11:33:14 lnxnfs01 kernel:  disk 19, o:1, dev:sdt
Jan 27 11:33:14 lnxnfs01 kernel:  disk 20, o:1, dev:sdu
Jan 27 11:33:14 lnxnfs01 kernel:  disk 21, o:1, dev:sdv
Jan 27 11:33:14 lnxnfs01 kernel:  disk 22, o:1, dev:sdw
Jan 27 11:33:14 lnxnfs01 kernel:  disk 23, o:1, dev:sdx
Jan 27 11:33:14 lnxnfs01 kernel: md0: detected capacity change from 0 to 45062020923392
Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device /dev/disk/by-uuid/2b9114be-3d5a-41d7-8d4b-e5047d223129.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started udev Wait for Complete Device Initialization.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started Timer to wait for more drives before activating degraded array..
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Timer to wait for more drives before activating degraded array..
Jan 27 11:33:14 lnxnfs01 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting LVM2 PV scan on device 9:127...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Activation of DM RAID sets...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Mounting Mount /share RAID partition explicitly...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting File System Check on /dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Activating swap /dev/mapper/vg_root-swap...
Jan 27 11:33:14 lnxnfs01 lvm[1494]: 2 logical volume(s) in volume group "vg_root" now active
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started LVM2 PV scan on device 9:127.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Activated swap /dev/mapper/vg_root-swap.
Jan 27 11:33:14 lnxnfs01 kernel: Adding 32763900k swap on /dev/mapper/vg_root-swap.  Priority:-1 extents:1 across:32763900k FS
Jan 27 11:33:14 lnxnfs01 systemd-fsck[1499]: /dev/md126: clean, 345/64128 files, 47930/256240 blocks
Jan 27 11:33:14 lnxnfs01 systemd[1]: Reached target Swap.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Swap.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Started File System Check on /dev/disk/by-uuid/283669e9-f32c-498d-b848-c6f91738c959.
Jan 27 11:33:14 lnxnfs01 systemd[1]: Mounting /boot...
Jan 27 11:33:14 lnxnfs01 systemd[1]: Mounted /boot.
Jan 27 11:33:14 lnxnfs01 kernel: EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: (null)
Jan 27 11:33:21 lnxnfs01 systemd[1]: Started Activation of DM RAID sets.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Reached target Encrypted Volumes.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Starting Encrypted Volumes.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Reached target Local File Systems.
Jan 27 11:33:21 lnxnfs01 systemd[1]: Starting Local File Systems.
...
Jan 27 11:33:21 lnxnfs01 kernel: EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: (null)
Jan 27 11:33:21 lnxnfs01 systemd[1]: Mounted Mount /share RAID partition explicitly.

... + 31 seconds from disk initialization, expiration of 30 second timer from mdadm-last-resort@.timer

Jan 27 11:33:45 lnxnfs01 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped target Local File Systems.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopping Local File Systems.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounting Mount /share RAID partition explicitly...
Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting Activate md array even though degraded...
Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped (with error) /dev/md0.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Started Activate md array even though degraded.
Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounted Mount /share RAID partition explicitly.

... + 121 seconds from disk initialization, expiration of 120 second timer from TimeoutSec=120 in /etc/systemd/system/share.mount

Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-md-0.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /dev/md/0.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-md-0.device/stop failed with result 'timeout'.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2dname\x2dlnxnfs01:0.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /dev/disk/by-id/md-name-lnxnfs01:0.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2dname\x2dlnxnfs01:0.device/stop failed with result 'timeout'.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job sys-devices-virtual-block-md0.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /sys/devices/virtual/block/md0.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job sys-devices-virtual-block-md0.device/stop failed with result 'timeout'.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2duuid\x2d97566d2f:ae7a169b:966f5840:3e8267f9.device/stop timed out.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Timed out stoppping /dev/disk/by-id/md-uuid-97566d2f:ae7a169b:966f5840:3e8267f9.
Jan 27 11:35:15 lnxnfs01 systemd[1]: Job dev-disk-by\x2did-md\x2duuid\x2d97566d2f:ae7a169b:966f5840:3e8267f9.device/stop failed with result 'timeout'.


In the logs above, the timer is started to wait for devices, it does not stop immediately because it sees a degraded array. It continues for 30 seconds and then kicks out my mount believing the raid is problematic. 

As an experiment, I've increased the values for the mdadm-last-resort@.timer to OnActiveSec=300 and TimeoutSec=320. This give me enough time to log in to the system. During that time, I can view the RAID and everything appears proper, yet 300 seconds later the mdadm-last-resort@.timer expires with an error on /dev/md0?

Perhaps systemd is working normally, but then the question is why is the raid not being assembled properly - which triggers /usr/lib/systemd/system/mdadm-last-resort@.service?

>From your suggestion I need to next move to full udev debug logs? Can I delay the assembly of the RAID by the kernel if this is a race condition, as it appears that might be the case? Should that delay be early in the systemd startup process or elsewhere?

Thanks again,
Luke Pyzowski
��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux