Re: bootsect replicated in p1, RAID enclosure suggestions?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 24, 2016 at 11:15:58AM -0600, Chris Murphy wrote:
> OK well you don't tell us what the mdadm create command was, there's
> no information on the metadata version, no mdadm -E or -D output, etc.
> There's really nothing to go on here. So we can't tell what the
> problem is either, or what your question is.

Thanks for the response, I learned some interesting things!

Here is one of the non-nuked drives:

$ sudo mdadm -E /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : <elided>
           Name : <elided>
  Creation Time : Wed Aug 10 11:33:41 2016
     Raid Level : raid0
   Raid Devices : 4

 Avail Dev Size : 7814035071 (3726.02 GiB 4000.79 GB)
    Data Offset : 16 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : <elided)

    Update Time : Wed Aug 10 11:33:41 2016
       Checksum : 490b562f - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)

Here is what should be the same, only device 2 in the array
(device 3 is similar or identical):

$ sudo mdadm -E /dev/sdf1
/dev/sdf1:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
$ sudo mdadm -D /dev/sdf1
mdadm: /dev/sdf1 does not appear to be an md device

Sadly, I can't do a mdadm -D because I can't assemble the RAID.
$ sudo mdadm -E /dev/md127
$

The command history is gone, but I would imagine that the RAID was
created with something like this:

mdadm --create /dev/md/bu --level=0 --raid-devices=4 /dev/sd{b,c,d,e}1

Although it could have been level=linear.

To summarize my email:
"Is this is a known problem? If not, here is a bug report"

> > Any recommendations on a low power hardware with a well-supported
> > distro, that matches up well with a real backplane and SATA
> > connections instead of USB.  The only caveat is that I want to encrypt
> > raw disks and it has to not be very noisy - so no rackmount gear
> > with 65dB 1" dog whistle fans.  Obviously, whatever backplane must
> > be well-supported by the distro.
> 
> OK so you just want to give up on the existing setup and you want
> advice on a whole new setup? From my perspective you're basically on
> three separate threads at this point.

Depends on the circumstances.  I'm prepared to if there are no obvious
fixes.  My intuition tells me the issue may be in the 4-bay switched
SATA enclosure, or the USB connection, or the driver thereof, and not
mdraid itself.  I'm happy to be wrong on that.

BTW, in case this rings any bells as being buggy, here is the enclosure:
https://www.amazon.com/Mediasonic-ProBox-HF2-SU3S2-SATA-Enclosure/dp/B003X26VV4/

> It's a WDC Red with a physical sector size of 4096B, so it looks like
> the USB enclosure is doing the typical thing of masking the try
> physical sector size from the kernel. This is better than the opposite
> where the enclosure reports the drive as 4096B/4096B logical/physical,
> where the drive itself has 512B logical sectors, as this will cause
> problems if the drive is ever removed from that enclosure, or put into
> one that doesn't report 4096B logical sectors.

Oooh, that's meaty information thank you.  I hadn't kept up with
things since the great 2TB changeover.  That could explain some crap I
see with larger drives and USB enclosures.  The problems you describe,
I saw back in the great 2GB switchover. Seagate had some boot sector
magic that would make things work by changing the cylinder sizes,
until it didn't....

> > # parted /dev/sdd1
> > GNU Parted 2.3
> > Using /dev/sdd1
> > Welcome to GNU Parted! Type 'help' to view a list of commands.
> > (parted) p
> > Model: Unknown (unknown)
> > Disk /dev/sdd1: 4001GB
> > Sector size (logical/physical): 512B/512B
> > Partition Table: gpt
> >
> > Number  Start   End     Size    File system  Name        Flags
> >  1      1049kB  4001GB  4001GB               Linux RAID  raid
> 
> It's purely speculation, but it sounds like to me in the history of
> one or more drives, the previous signatures weren't removed before the
> drive was retasked for its new purpose. That's the folly of not wiping
> the signatures in the reverse order they were created, and just
> expecting that starting over will wipe those old signatures.

It's possible, but why would you ever end up with a GPT in a partition?

I've certainly encountered this "GPT outside cylinder 0" on these two
drives before, but it goes away with a forcible reassemble or recreate
(which I did last time), because the mdlabel blows it away. Unless
it's something this list knows about, I suspect it is a firmware
glitch in the USB enclosure.

> But I think there is a legitimate gripe that parted probably should
> not operate on partitions like this. It's not valid to have nested
> GPTs like this. And I have no idea if parted is showing you valid or
> bogus information. You'd need to do something like:
> 
> dd if=/dev/sdd1 count=2 2>/dev/null | hexdump -C

## Good disk (for comparison):
$ sudo dd if=/dev/sdd1 count=2 2> /dev/null | file -
/dev/stdin: data
$ sudo dd if=/dev/sdd1 count=2 2> /dev/null | hexdump -C | head -20 
00000000  ff 02 19 2e 03 ee fa d8  6d d7 24 78 e1 d4 04 3d  |........m.$x...=|
00000010  c9 92 33 97 17 7a 10 d3  05 bd 39 36 b4 a9 7c 14  |..3..z....96..|.|
00000020  a7 de 66 b6 cd d9 ff ef  45 27 74 6e 94 0a 03 49  |..f.....E'tn...I|
00000030  d4 43 26 2d 45 39 d1 93  8a 35 91 91 ff c9 a4 8e  |.C&-E9...5......|
00000040  bd 9a 06 6d cc f2 89 65  c0 91 87 1c 1b f0 da 2f  |...m...e......./|
00000050  83 c2 12 eb 80 3c c2 4c  68 cc 65 40 26 13 e0 77  |.....<.Lh.e@&..w|
00000060  38 15 ed 78 27 76 4c 91  71 99 3e 9f 99 f1 3f 51  |8..x'vL.q.>...?Q|
00000070  19 db 12 a3 ac b6 61 12  ff d9 37 87 31 1f 8b dd  |......a...7.1...|
00000080  88 82 de fb db f2 a5 31  10 2a d2 03 be 12 be bd  |.......1.*......|
00000090  19 46 9f c1 3b ea a1 37  81 d2 4d 00 54 e7 b4 55  |.F..;..7..M.T..U|
000000a0  b7 65 6c 3f 95 40 b0 f4  28 ff 90 62 22 cb 22 fd  |.el?.@..(..b".".|
000000b0  6b 4d 90 56 32 4b c6 22  35 b1 62 76 e1 fd 82 d5  |kM.V2K."5.bv....|
000000c0  03 40 c0 85 4b ac 5a 44  9e 6a 25 97 d3 7f bd fe  |.@..K.ZD.j%.....|
000000d0  0c 2d a8 bb 33 f4 00 df  7a 05 ae 6d b3 3e f3 7d  |.-..3...z..m.>.}|
000000e0  34 9e 0e 57 14 de d8 e0  28 63 82 a6 2a 8a 1f fc  |4..W....(c..*...|
000000f0  fe 2f b0 69 67 ac 0a e9  c2 53 a7 d8 36 1a 18 5a  |./.ig....S..6..Z|
00000100  d6 d4 e6 ce df f7 fc 67  13 eb 25 08 45 50 10 7b  |.......g..%.EP.{|
00000110  c6 23 1e 59 dc 2d c2 65  53 90 ca ec 21 e7 28 74  |.#.Y.-.eS...!.(t|
00000120  41 7f 3e 58 72 08 75 c1  d5 ca d0 91 55 5f 43 6a  |A.>Xr.u.....U_Cj|
00000130  4e 84 d5 7f aa f2 b5 27  e4 86 5d 28 ae 6c 29 a1  |N......'..](.l).|

## Bad disk:
$ sudo dd if=/dev/sdf1 count=2 2> /dev/null | file -
/dev/stdin: x86 boot sector; partition 1: ID=0xee, starthead 0, startsector 1, 4294967295 sectors, code offset 0x6f
$ sudo dd if=/dev/sdf1 count=2 2> /dev/null | hexdump -C 
00000000  38 6f 96 52 ea 9c 31 cd  10 a2 84 58 a2 f0 f5 43  |8o.R..1....X...C|
00000010  0f f2 5a 9b c7 ff 82 b2  d8 59 86 60 15 bc 31 65  |..Z......Y.`..1e|
00000020  bc d7 77 f9 31 6a c8 16  3f 13 90 24 b7 57 ff 6b  |..w.1j..?..$.W.k|
00000030  64 7e e2 99 2a 99 f7 32  69 be aa 56 36 31 f7 db  |d~..*..2i..V61..|
00000040  8c 4c 4c 12 68 19 77 0f  f6 3b 92 bf 18 92 c2 45  |.LL.h.w..;.....E|
00000050  73 d5 b7 93 cc ae 6b b9  b0 bd 0c 85 a9 c3 19 f7  |s.....k.........|
00000060  87 34 b8 be 0a 95 cd 03  03 d5 01 49 b5 b0 86 fe  |.4.........I....|
00000070  71 1c d2 f6 42 ed ce b0  eb c3 5f 4c 07 34 30 c7  |q...B....._L.40.|
00000080  8a 1f 91 c4 8b 28 b9 07  8e da ae 7d 7d c5 24 2b  |.....(.....}}.$+|
00000090  6d f9 ea a3 6a 83 9d b8  6a 1f 6d db 3a 01 22 c7  |m...j...j.m.:.".|
000000a0  56 fc 2a 46 f8 b2 84 31  d1 8b 58 55 b6 5a 36 7b  |V.*F...1..XU.Z6{|
000000b0  48 5d 98 2a 3f f0 ae 80  2b f8 6b b2 7f 1e 27 c2  |H].*?...+.k...'.|
000000c0  59 65 d0 bf c7 f0 5b 18  dc 59 8e 68 46 03 b6 ca  |Ye....[..Y.hF...|
000000d0  42 06 7a 52 7a 49 36 03  0d d5 9b 67 a2 03 3b 13  |B.zRzI6....g..;.|
000000e0  40 23 19 f5 1a a6 bd fb  c8 d5 5b 26 f5 6a 86 ab  |@#........[&.j..|
000000f0  89 77 98 d8 09 cb b7 59  80 03 81 48 ba c6 ce 77  |.w.....Y...H...w|
00000100  3c 6c d2 ba a0 71 c3 20  18 fd 77 db ca a8 8a e3  |<l...q. ..w.....|
00000110  8d 6c 1f 17 d5 9f e5 81  bf 50 62 c3 bc f8 6c 5d  |.l.......Pb...l]|
00000120  f7 3f a6 37 6b a9 53 2b  88 15 5d 6e 1e 48 4f b4  |.?.7k.S+..]n.HO.|
00000130  db af b4 f7 f5 7b 4d f3  3f 60 44 60 6e a2 c4 6d  |.....{M.?`D`n..m|
00000140  b9 6c 88 04 e8 66 d1 7c  a0 09 10 66 32 de 70 e1  |.l...f.|...f2.p.|
00000150  98 40 54 5e 1d f2 af b8  2e d1 75 0d 3c 46 1f f8  |.@T^......u.<F..|
00000160  85 72 49 87 ad 92 59 28  fd 9d 22 8e 1b 9f 2c 00  |.rI...Y(.."...,.|
00000170  87 58 74 01 63 a5 94 13  e3 9c ea ec 3f 21 22 41  |.Xt.c.......?!"A|
00000180  05 13 78 f3 a8 46 b3 02  9e 23 cb 9d 21 db a6 ae  |..x..F...#..!...|
00000190  08 a8 70 48 18 6c e2 38  e4 ac 03 6e 06 74 17 7c  |..pH.l.8...n.t.||
000001a0  90 ca 9f 5e 2e 2b 84 ef  52 2c 08 9a 48 98 f9 46  |...^.+..R,..H..F|
000001b0  f4 9f 00 cd ec a0 11 d7  00 00 00 00 00 00 00 00  |................|
000001c0  02 00 ee ff ff ff 01 00  00 00 ff ff ff ff 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00000210  3a dc 43 c4 00 00 00 00  01 00 00 00 00 00 00 00  |:.C.............|
00000220  8e b6 c0 d1 01 00 00 00  22 00 00 00 00 00 00 00  |........".......|
00000230  6d b6 c0 d1 01 00 00 00  a5 4f bd 75 f6 c8 4f 43  |m........O.u..OC|
00000240  92 31 ab b6 a9 59 aa 04  02 00 00 00 00 00 00 00  |.1...Y..........|
00000250  80 00 00 00 80 00 00 00  59 04 3d 4a 00 00 00 00  |........Y.=J....|
00000260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

## is that the same as the boot sector itself?  Interesting q.
# dd if=/dev/sdd count=2 of=/tmp/foo && dd if=/dev/sdd1 count=2 of=/tmp/bar && cmp /tmp/foo /tmp/bar
## Nope, how do they differ?  Well that's a bit unpleasant to do manually but here...
# dd if=/dev/sdd count=2 2> /dev/null | hexdump -C
00000000  10 06 27 48 33 df bb 55  8b 28 fe 60 5e 18 6d 38  |..'H3..U.(.`^.m8|
00000010  fc b3 17 36 55 de fd 83  d0 52 72 19 d0 76 12 f0  |...6U....Rr..v..|
00000020  1e 23 bc 4d c5 4d c2 d6  5a d4 2b cd 16 78 c9 28  |.#.M.M..Z.+..x.(|
00000030  77 21 c4 9f c4 b7 48 ad  e0 7b 08 d6 f5 8e 92 a7  |w!....H..{......|
00000040  bc 88 35 02 e7 f8 b8 3b  05 97 db a3 ad e7 96 4b  |..5....;.......K|
00000050  84 d9 e2 a4 3a 5a 07 ac  fc a2 78 58 d7 c8 5a 19  |....:Z....xX..Z.|
00000060  88 9c f6 f2 c0 ec 99 55  d9 5d 00 87 3a 86 52 01  |.......U.]..:.R.|
00000070  92 58 25 82 99 50 8e 28  0f 42 07 71 9a a3 db 82  |.X%..P.(.B.q....|
00000080  00 d9 b8 28 9d d8 97 85  9d c6 fb 5e 4d 94 3a 6e  |...(.......^M.:n|
00000090  19 3c a6 ce 57 6b a0 52  d6 72 0c 41 2e cd cb a2  |.<..Wk.R.r.A....|
000000a0  15 c8 d4 c8 8c 90 34 5f  15 ab 69 96 af 3d 7e 30  |......4_..i..=~0|
000000b0  25 e1 72 35 d6 c4 b2 5e  78 72 0b 3f 9a 96 40 7e  |%.r5...^xr.?..@~|
000000c0  c6 aa 0e 5a da 99 ae fe  a3 93 8b 5b c4 bf 91 64  |...Z.......[...d|
000000d0  d5 62 12 ea 70 15 a9 05  81 8d e4 fb 36 15 c9 63  |.b..p.......6..c|
000000e0  ba f9 d2 5c f6 df 28 71  d8 d5 82 95 2b 83 40 db  |...\..(q....+.@.|
000000f0  9b fe e2 a7 9b 38 5e 5f  51 a6 6e e6 7b 4e bf 02  |.....8^_Q.n.{N..|
00000100  d2 fb aa f9 2c 7a 5b f5  47 ad ac 7e d1 1c f3 1b  |....,z[.G..~....|
00000110  a3 8e 54 9f a4 8d 1a 02  3f cc 81 f0 ca e9 28 1e  |..T.....?.....(.|
00000120  33 9e d8 71 dd f2 aa b7  d4 06 96 cb 0c 8e f1 6a  |3..q...........j|
00000130  88 1d 2a 8a a3 33 00 8c  ef d4 d8 39 3e 70 18 34  |..*..3.....9>p.4|
00000140  e6 3a cd e7 0b d6 82 a8  a4 aa ff bd b3 69 0a cc  |.:...........i..|
00000150  32 9e e3 26 34 bb cc 0e  b0 69 5f 9a c5 f3 57 7d  |2..&4....i_...W}|
00000160  47 82 bc 66 44 55 c4 de  3c 2c 14 d0 9a 73 6a da  |G..fDU..<,...sj.|
00000170  3c 5e f8 99 26 5b f4 8a  13 a1 f1 c8 a9 20 4c 3a  |<^..&[....... L:|
00000180  bd 03 4e e9 83 25 46 32  3f 80 3e 42 58 e7 18 27  |..N..%F2?.>BX..'|
00000190  8a c8 7c 8c 74 99 96 61  d4 e2 58 c2 27 71 8c 3b  |..|.t..a..X.'q.;|
000001a0  da 33 f8 7f b5 c1 a7 a0  c2 7b 54 29 0d 47 b4 b5  |.3.......{T).G..|
000001b0  4c 62 5b f8 e9 6f bc 29  00 00 00 00 00 00 00 00  |Lb[..o.)........|
000001c0  02 00 ee ff ff ff 01 00  00 00 ff ff ff ff 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART....\...|
00000210  62 01 85 1f 00 00 00 00  01 00 00 00 00 00 00 00  |b...............|
00000220  af be c0 d1 01 00 00 00  22 00 00 00 00 00 00 00  |........".......|
00000230  8e be c0 d1 01 00 00 00  e2 89 58 78 77 63 52 44  |..........XxwcRD|
00000240  93 9e 4a 93 16 06 86 6b  02 00 00 00 00 00 00 00  |..J....k........|
00000250  80 00 00 00 80 00 00 00  5d ff 7e 02 00 00 00 00  |........].~.....|
00000260  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

> And then we can see if there really is a PMBR and GPT in that first
> sector that parted is picking up. But where it could be coming from in
> an mdadm linear layout? No idea.
> 
> The other thing to check is the end of the partition, because GPT has
> a primary and backup. So the 2nd to last sector of sdd1 may have a
> backup GPT on it, and possibly something is wrongly restoring it
> sometimes.
> 
> In any case I would still look to using something much much newer than
> parted 2.3, it's basically Pleistocene old, and the version of mdadm
> is also likewise old. But this is what happens with LTS releases,
> ancient software for which no one except its maintainers remember the
> state and history.

I understand and can probably acquire the most recent stable and
compile from source, if you think that would prove useful enough to
justify the effort.  TBH once GPT came out I lost track of which
partitioning tool was appropriate to use, it seemed like (IIRC)
cfdisk, sfdisk, parted were all vying for my attention... is parted
now the standard?

At the current moment I am backing up the drives so that I can try a
forcible reassemble.  I think that last time this happened, that
effectively relabeled the mdraid partitions and fixed the problem.
The underlying mdraid has an LVM on LUKS, but last time this happened
I managed to fsck and get 99% of the data back, with only a few things
ending up in lost+found.  Presumably there might have been some data
corruption, but since it's a backup server only I consider it
tolerable, modulo the failed Windows system which needs to restore
from it.
-- 
http://www.subspacefield.org/~travis/ | if spammer then john@xxxxxxxxxxxxxxxxx
"Computer crime, the glamor crime of the 1970s, will become in the
1980s one of the greatest sources of preventable business loss."
John M. Carroll, "Computer Security", first edition cover flap, 1977
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux