LVM2, NFS and random device (major:minor) numbers

Paul Raines <raines@nmr.mgh.harvard.edu> · Fri, 14 Apr 2006 10:08:40 -0400 (EDT)

I have a NFS server still running CentOS 4.0 (kernel 2.6.9-5.0.5.ELsmp, 
lvm2-2.00.31-1.0.RHEL4).  It has two 3ware 9500S-8 controllers translating 
to two RAIDs at /dev/sda and /dev/sdb. The OS is on a small regular IDE 
disk.  Both sda and sdb are 2TBs, have a GPT partition table and have one 
LVM flagged partition taking up the whole 2TBs -- /dev/sda1 and /dev/sdb1 
each made into PVs.  I created two volume groups, vg1 on sda1 and vg2 on 
sda2.  No other PVs involved.

I started making volumes on vg1 and no problems.  Recently I made a couple 
volumes on vg2 and started using them with no problems.  Then yesterday I 
rebooted the box for the first time since creating volumes on vg2. 
Suddenly I get reports from my users who have NFS mounted the volumes that 
data is missing or wrong.  Looking closely I see that volumes off this 
server are suddenly mounted in the wrong spots off by two (volume1 is 
mounted where volume3 is supposed to be, volume2 is mounted where volume 4 
is supposed to be, and son on).

NFS depends on what the server says the underlying device id is of the 
exported volume and if that changes, then NFS mounts are going to change 
out from under themselves.  And everytime I add a volume and reboot the 
device ids change.

I can see that vgscan always sees vg2 before vg1 but gives no way to change 
the order it sees things.  I suspect if I had filled up vg2 first and then 
made volumes on vg1, I never would have discovered this problem unless
I went back and deleted a volume on vg2.

After frantically search the net about what the hell is going on I first 
find a Changelog for lvm1 saying a bug just like this got fixed. But this 
is lvm2 and it seems to not be fixed.  I try upgrading to CentOS 4.2 but 
testing the issue again by creating more volumes on vg2 and the rebooting 
show the underlying device ids change once again.

I finally come upon this old post to this list:

http://www.redhat.com/archives/linux-lvm/2005-May/msg00029.html

QUESTION 1: My first question is why aren't major/minor assignments 
persistent in the first place by default?  I cannot see a reason why not 
and the current default behavior is just asking for trouble for anyone 
doing NFS export of their LVM volumes.  It also screwes up many incremental 
backup programs that depend on persistent device numbers so the exports 
fsid= thing is not a full solution.

It seems like something about this should be screamed as a loud warning in 
the LVM HOWTO if things really are supposed to work way, but I could find 
no reference to it at all.

On my server, I turned off the NFS server, umounted all the volumes and 
desperately tried to think of a way I could get my volumes back to their 
original device IDs without having to possibly reboot over 500 Linux 
clients to clear the issue.  My first attempt was to use

   lvchange -My --major ### --minor ###

on my volumes. On the first volume in vg1 I did

   lvchange -My --major 253 --minor 1

and on the second I did the same with minor of 2, and so on.  Then in
vg2 I decided the easiest thing was give it a different major, so
on its first volume I did:

   lvchange -My  --major 254 --minor 1

and so on on the other five volumes in vg2.  Then reboot.  And discovered 
that my first five volumes in vg1 refused to mount complaining another
volume was already using its major/minor numbers.

Doing a 'lvdisplay -v' on the first volume in vg2, I could see it was
still using major 253 despite having given it major 254 above.  In
fact the lvdisplay showed inconsistent data like this:

  Persistent major       254
  Persistent minor       1
  Block device           253:101

QUESTION 2: I then saw in the mail post above that kernel 2.6 ignores
the major number given.  But the lvchange command insisted that I give
it anyway giving me the illusion that it mattered.  So I think this
is a bug in lvchange that should be fixed.  But is the major number
something that could still be in flux and someday I find all my
volumes have changed device number again due to the device mapper giving
all LVM volumes sam random new major?

So I ending up going to scheme where on vg2 volumes I still use major 253 
and start the minor numbers at 100+<volnum> so the first volume on vg2 has 
minor 101.  This finally seemed to work okay after reboot (assuming
the major number is never going to change).

QUESTION 3: So it seems as standard practice on using lvcreate from now on 
I need to -My to prevent these problems.  Is this what everyone else is 
doing?

QUESTION 4: I have several other severs configured just like the one above. 
And on one I know have created new volumes on the 2nd volume group and have 
not rebooted it since.  I am now afraid to do so and I am trying to figure 
out what I can do.  Can I do 'lvdisplay' on each live, NFS exported volume 
to get its current major:minor and then run 'lvchange -My' on it also live 
to set it to persistant?  Or will it have to be offline?

--
---------------------------------------------------------------
Paul Raines                email: raines at nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street     Charlestown, MA 02129	    USA

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/