On 5/12/20 3:27 AM, Song Liu wrote: > On Mon, May 11, 2020 at 4:13 AM Michal Soltys <msoltyspl@xxxxxxxxx> wrote: >> >> On 5/10/20 1:57 AM, Michal Soltys wrote: >>> Anyway, I did some tests with manually snapshotted component devices >>> (using dm snapshot target to not touch underlying devices). >>> >>> The raid manages to force assemble in read-only mode with missing >>> journal device, so we probably will be able to recover most data >>> underneath this way (as a last resort). >>> >>> The situation I'm in now is likely from uncelan shutdown after all (why >>> the machine failed to react to ups properly is another subject). >>> >>> I'd still want to find out why is - apparently - a journal device giving >>> issues (contrary to what I'd expect it to do ...), with notable mention of: >>> >>> 1) mdadm hangs (unkillable, so I presume in kernel somewhere) and eats 1 >>> cpu when trying to assemble the raid with journal device present; once >>> it happens I can't do anything with the array (stop, run, etc.) and can >>> only reboot the server to "fix" that >>> >>> 2) mdadm -D shows nonsensical device size after assembly attempt (Used >>> Dev Size : 18446744073709551615) >>> >>> 3) the journal device (which itself is md raid1 consisting of 2 ssds) >>> assembles, checks (0 mismatch_cnt) fine - and overall looks ok. >>> >>> >>> From other interesting things, I also attempted to assemble the raid >>> with snapshotted journal. From what I can see it does attempt to do >>> something, judging from: >>> >>> dmsetup status: >>> >>> snap_jo2: 0 536870912 snapshot 40/33554432 16 >>> snap_sdi1: 0 7812500000 snapshot 25768/83886080 112 >>> snap_jo1: 0 536870912 snapshot 40/33554432 16 >>> snap_sdg1: 0 7812500000 snapshot 25456/83886080 112 >>> snap_sdj1: 0 7812500000 snapshot 25928/83886080 112 >>> snap_sdh1: 0 7812500000 snapshot 25352/83886080 112 >>> >>> But it doesn't move from those values (with mdadm doing nothing eating >>> 100% cpu as mentioned earlier). >>> >>> >>> Any suggestions how to proceed would very be appreciated. >> >> >> I've added Song to the CC. If you have any suggestions how to >> proceed/debug this (mdadm stuck somewhere in kernel as far as I can see >> - while attempting to assembly it). >> >> For the record, I can assemble the raid successfully w/o journal (using >> snapshotted component devices as above), and we did recover some stuff >> this way from some filesystems - but for some other ones I'd like to >> keep that option as the very last resort. > > Sorry for delayed response. > > A few questions. > > For these two outputs: > #1 > Name : xs22:r5_big (local to host xs22) > UUID : d5995d76:67d7fabd:05392f87:25a91a97 > Events : 56283 > > Number Major Minor RaidDevice State > - 0 0 0 removed > - 0 0 1 removed > - 0 0 2 removed > - 0 0 3 removed > > - 8 145 3 sync /dev/sdj1 > - 8 129 2 sync /dev/sdi1 > - 9 127 - spare /dev/md/xs22:r1_journal_big > - 8 113 1 sync /dev/sdh1 > - 8 97 0 sync /dev/sdg1 > > #2 > /dev/md/r1_journal_big: > Magic : a92b4efc > Version : 1.1 > Feature Map : 0x200 > Array UUID : d5995d76:67d7fabd:05392f87:25a91a97 > Name : xs22:r5_big (local to host xs22) > Creation Time : Tue Mar 5 19:28:58 2019 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 536344576 (255.75 GiB 274.61 GB) > Array Size : 11718355968 (11175.50 GiB 11999.60 GB) > Used Dev Size : 7812237312 (3725.17 GiB 3999.87 GB) > Data Offset : 262144 sectors > Super Offset : 0 sectors > Unused Space : before=261872 sectors, after=0 sectors > State : clean > Device UUID : c3a6f2f6:7dd26b0c:08a31ad7:cc8ed2a9 > > Update Time : Sat May 9 15:05:22 2020 > Bad Block Log : 512 entries available at offset 264 sectors > Checksum : c854904f - correct > Events : 56289 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : Journal > Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing) > > > Are these captured back to back? I am asking because they showed different > "Events" number. Nah, they were captured between reboots. Back to back all event fields show identical value (at 56291 now). > > Also, when mdadm -A hangs, could you please capture /proc/$(pidof mdadm)/stack ? > The output is empty: xs22:/☠ ps -eF fww | grep mdadm root 10332 9362 97 740 1884 25 12:47 pts/1 R+ 6:59 | \_ mdadm -A /dev/md/r5_big /dev/md/r1_journal_big /dev/sdi1 /dev/sdg1 /dev/sdj1 /dev/sdh1 xs22:/☠ cd /proc/10332 xs22:/proc/10332☠ cat stack xs22:/proc/10332☠ > 18446744073709551615 is 0xffffffffffffffffL, so it is not initialized > by data from the disk. > I suspect we hang somewhere before this value is initialized. >