Re: LVM on top of DRBD

emmanuel segura <emi2fast@gmail.com> · Mon, 9 Jan 2017 11:52:38 +0100

use the same OS version.

2017-01-08 19:58 GMT+01:00  <knebb@knebb.de>:
> Hi all,
>
>
> I have to cross-post to LVM as well to DRBD mailing list as I have no
> clue where the issue is- if it's not a bug...
>
> I can not get working LVM  on top of drbd- I am getting I/O erros
> followed by "diskless" state.
>
> Steps to reproduce:
>
> Two machine2.
>
> A: CentOS7 x64; epel-providedd packages
> kmod-drbd84-8.4.9-1.el7.elrepo.x86_64
> drbd84-utils-8.9.8-1.el7.elrepo.x86_64
>
> B: CentOS6 x64; epel-provided packages
> kmod-drbd83-8.3.16-3.el6.elrepo.x86_64
> drbd83-utils-8.3.16-1.el6.elrepo.x86_64
>
> drbd1.res:
> resource drbd1 {
>   protocol A;
>   startup {
>         wfc-timeout 240;
>         degr-wfc-timeout     120;
>         become-primary-on backuppc;
>         }
>   net {
>         max-buffers 8000;
>         max-epoch-size 8000;
>         sndbuf-size 128k;
>         shared-secret "13Lue=3";
>         }
>   syncer {
>         rate 500M;
>         }
>   on backuppc {
>     device /dev/drbd1;
>     disk /dev/sdc;
>     address 192.168.0.1:7790;
>     meta-disk internal;
>   }
>   on drbd {
>     device /dev/drbd1;
>     disk /dev/sda;
>     address 192.168.2.16:7790;
>     meta-disk internal;
>   }
> }
>
> I was able to create the drbd as expected (see first line of following
> syslog), it gets in sync.
> So I set up LVM and create filter rules so LVM should ignore the
> underlying physical device:
> /etc/lvm/lvm.conf [node1]:
> filter = ["r|/dev/sdc|"];
> /etc/lvm/lvm.conf [node2]:
> filter = [ "r|/dev/sda|" ]
>
> LVM ignores sda as expected:
> #>  pvscan
>   PV /dev/sda2   VG cl              lvm2 [15,00 GiB / 0    free]
>   Total: 1 [15,00 GiB] / in use: 1 [15,00 GiB] / in no VG: 0 [0   ]
>
> Now creating PV, VG, LV:
> [root@backuppc etc]# pvcreate /dev/drbd1
>   Physical volume "/dev/drbd1" successfully created.
> [root@backuppc etc]# vgcreate test /dev/drbd1
>   Volume group "test" successfully created
> [root@backuppc etc]# lvcreate test -n test  -L 3G
>   Volume group "test" has insufficient free space (767 extents): 768
> required.
> [root@backuppc etc]# lvcreate test -n test  -L 2.9G
>   Rounding up size to full physical extent 2,90 GiB
>   Logical volume "test" created.
> [root@backuppc etc]# vgdisplay -v test
>   --- Volume group ---
>   VG Name               test
>   System ID
>   Format                lvm2
>   Metadata Areas        1
>   Metadata Sequence No  2
>   VG Access             read/write
>   VG Status             resizable
>   MAX LV                0
>   Cur LV                1
>   Open LV               0
>   Max PV                0
>   Cur PV                1
>   Act PV                1
>   VG Size               3,00 GiB
>   PE Size               4,00 MiB
>   Total PE              767
>   Alloc PE / Size       743 / 2,90 GiB
>   Free  PE / Size       24 / 96,00 MiB
>   VG UUID               pUPkxh-oS0f-MEUY-yIeJ-3zPb-Fkg1-TW1fgh
>   --- Logical volume ---
>   LV Path                /dev/test/test
>   LV Name                test
>   VG Name                test
>   LV UUID                X0wpkL-niZ7-XT7u-zjT0-ETzC-hYbI-yyv13F
>   LV Write Access        read/write
>   LV Creation host, time backuppc, 2017-01-07 10:57:29 +0100
>   LV Status              available
>   # open                 0
>   LV Size                2,90 GiB
>   Current LE             743
>   Segments               1
>   Allocation             inherit
>   Read ahead sectors     auto
>   - currently set to     8192
>   Block device           253:2
>   --- Physical volumes ---
>   PV Name               /dev/drbd1
>   PV UUID               3tcvkG-Keqk-vplB-f9zY-1X34-ZxCI-eFYPio
>   PV Status             allocatable
>   Total PE / Free PE    767 / 24
>
> Creating filesystem (sorry, output in German):
> [root@backuppc etc]# mkfs.ext4  /dev/test/test
> mke2fs 1.42.9 (28-Dec-2013)
> Dateisystem-Label=
> OS-Typ: Linux
> Blockgröße=4096 (log=2)
> Fragmentgröße=4096 (log=2)
> Stride=0 Blöcke, Stripebreite=0 Blöcke
> 190464 Inodes, 760832 Blöcke
> 38041 Blöcke (5.00%) reserviert für den Superuser
> Erster Datenblock=0
> Maximale Dateisystem-Blöcke=780140544
> 24 Blockgruppen
> 32768 Blöcke pro Gruppe, 32768 Fragmente pro Gruppe
> 7936 Inodes pro Gruppe
> Superblock-Sicherungskopien gespeichert in den Blöcken:
>         32768, 98304, 163840, 229376, 294912
>
> Platz für Gruppentabellen wird angefordert: erledigt
> Inode-Tabellen werden geschrieben: erledigt
> Erstelle Journal (16384 Blöcke): erledigt
> Schreibe Superblöcke und Dateisystem-Accountinginformationen: erledigt
>
> Mounting and start to use:
> [root@backuppc etc]# mount /dev/test/test /mnt
> [root@backuppc etc]# cd /mnt/
> [root@backuppc mnt]# cd ..
>
> I immediately get I/O errors in syslog (and NO, the physical disk is not
> damaged. Both are virtual machines (VMware ESXi 5.x) running on HW-RAID):
>
> Jan  7 10:42:07 backuppc kernel: block drbd1: Resync done (total 166
> sec; paused 0 sec; 18948 K/sec)
> Jan  7 10:42:07 backuppc kernel: block drbd1: updated UUIDs
> 2C441CCF3B27BA41:0000000000000000:C9022D0F617A83BA:0000000000000004
> Jan  7 10:42:07 backuppc kernel: block drbd1: conn( SyncSource ->
> Connected ) pdsk( Inconsistent -> UpToDate )
> Jan  7 10:58:44 backuppc kernel: EXT4-fs (dm-2): mounted filesystem with
> ordered data mode. Opts: (null)
> Jan  7 10:58:48 backuppc kernel: block drbd1: local WRITE IO error
> sector 5296+3960 on sdc
> Jan  7 10:58:48 backuppc kernel: block drbd1: disk( UpToDate -> Failed )
> Jan  7 10:58:48 backuppc kernel: block drbd1: Local IO failed in
> __req_mod. Detaching...
> Jan  7 10:58:48 backuppc kernel: block drbd1: 0 KB (0 bits) marked
> out-of-sync by on disk bit-map.
> Jan  7 10:58:48 backuppc kernel: block drbd1: disk( Failed -> Diskless )
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: sock was shut down by peer
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: peer( Secondary -> Unknown
> ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: short read (expected size 8)
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: meta connection shut down
> by peer.
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: ack_receiver terminated
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: Terminating drbd_a_drbd1
> Jan  7 10:58:48 backuppc kernel: block drbd1: helper command:
> /sbin/drbdadm pri-on-incon-degr minor-1
> Jan  7 10:58:48 backuppc kernel: block drbd1: helper command:
> /sbin/drbdadm pri-on-incon-degr minor-1 exit code 0 (0x0)
> Jan  7 10:58:48 backuppc kernel: block drbd1: Should have called
> drbd_al_complete_io(, 5296, 2027520), but my Disk seems to have failed
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: Connection closed
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: conn( BrokenPipe ->
> Unconnected )
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: receiver terminated
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: Restarting receiver thread
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: receiver (re)started
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: conn( Unconnected ->
> WFConnection )
> Jan  7 10:58:48 backuppc kernel: drbd drbd1: Not fencing peer, I'm not
> even Consistent myself.
> Jan  7 10:58:48 backuppc kernel: block drbd1: IO ERROR: neither local
> nor remote data, sector 29096+3968
> Jan  7 10:58:48 backuppc kernel: dm-2: WRITE SAME failed. Manually zeroing.
> Jan  7 10:58:48 backuppc kernel: block drbd1: IO ERROR: neither local
> nor remote data, sector 29096+256
> Jan  7 10:58:48 backuppc kernel: block drbd1: IO ERROR: neither local
> nor remote data, sector 29352+256
> Jan  7 10:58:48 backuppc kernel: block drbd1: IO ERROR: neither local
> nor remote data, sector 29608+256
> Jan  7 10:58:48 backuppc kernel: block drbd1: IO ERROR: neither local
> nor remote data, sector 29864+256
> Jan  7 10:58:49 backuppc kernel: drbd drbd1: Handshake successful:
> Agreed network protocol version 97
> Jan  7 10:58:49 backuppc kernel: drbd drbd1: Feature flags enabled on
> protocol level: 0x0 none.
> Jan  7 10:58:49 backuppc kernel: drbd drbd1: conn( WFConnection ->
> WFReportParams )
> Jan  7 10:58:49 backuppc kernel: drbd drbd1: Starting ack_recv thread
> (from drbd_r_drbd1 [22367])
> Jan  7 10:58:49 backuppc kernel: block drbd1: receiver updated UUIDs to
> effective data uuid: 2C441CCF3B27BA40
> Jan  7 10:58:49 backuppc kernel: block drbd1: peer( Unknown -> Secondary
> ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>
> In the end my /proc/drbd looks like this:
>
> version: 8.4.9-1 (api:1/proto:86-101)
> GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by
> akemi@Build64R7, 2016-12-04 01:08:48
>  1: cs:Connected ro:Primary/Secondary ds:Diskless/UpToDate A r-----
>     ns:3212879 nr:0 dw:67260 dr:3149797 al:27 bm:0 lo:0 pe:0 ua:0 ap:0
> ep:1 wo:f oos:0
>
> pvscan is still fine:
>
> [root@backuppc log]# pvscan
>   PV /dev/sda2    VG cl              lvm2 [15,00 GiB / 0    free]
>   PV /dev/drbd1   VG test            lvm2 [3,00 GiB / 96,00 MiB free]
>   Total: 2 [17,99 GiB] / in use: 2 [17,99 GiB] / in no VG: 0 [0   ]
>
> So anyone having an idea what is going wrong here?
>
>
> Greetings
>
>
> Christian
>
>
>
>
>
>
>
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

-- 
  .~.
  /V\
 //  \\
/(   )\
^`~'^

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/