Possible bug in rhel5 for nested HA-LVM resources?

Gianluca Cecchi <gianluca.cecchi@xxxxxxxxx> · Wed, 3 Mar 2010 18:01:21 +0100

Hello,my problem begins from this need:
- having a rh el 5.4 cluster with 2 nodes where I have HA-LVM in place and some lvm/fs pairs resources componing one service

I want to add a new lvm/fs to the cluster, without disrupting the running service.
My already configured and running lvm/mountpoints are:
/dev/mapper/VG_TEST_APPL-LV_TEST_APPL
                      5.0G  139M  4.6G   3% /appl_db1
/dev/mapper/VG_TEST_DATA-LV_TEST_DATA
                      5.0G  139M  4.6G   3% /oradata/TEST

The new desired mount point is to be put under /oradata/TEST/newtemp

Current extract of cluster.conf is 
                <service domain="MAIN" autostart="1" name="TESTSRV">
                        <ip ref="10.4.5.157"/>
                        <lvm ref="TEST_APPL"/>
                        <fs ref="TEST_APPL"/>
                        <lvm ref="TEST_DATA"/>
                        <fs ref="TEST_DATA"/>

                        <script ref="clusterssh"/>
                </service>

Based on my assumption about precedences/child resources and so on, I presumed one correct new conf would be: 
                <service domain="MAIN" autostart="1" name="TESTSRV">
                        <ip ref="10.4.5.157"/>
                        <lvm ref="TEST_APPL"/>
                        <fs ref="TEST_APPL"/>
                        <lvm ref="TEST_DATA"/>
                        <fs ref="TEST_DATA">

                                <lvm ref="TEST_TEMP"/>
                                 <fs ref="TEST_TEMP"/>
                        </fs>
                        <script ref="clusterssh"/>
                </service>

And in fact I was able to
- temporary verify new lvm/fs outside of cluster inserting its name in volume_list under lvm.conf (and touch of initrd files in /boot... --> this needs to be fixed some day ;-)
- vgchange -ay of the new vg and mount the fs ---> ok
- umount fs, remove filter from volume_list and touch initrd files
- change config with also an increased version number
- run ccs_tool update /etc/cluster/cluster.conf
- cman_tool version -r new_version

OK.
The problem is that any consequent relocate/restart is not able to start the service.... also a reboot of the test node
This is because during my preliminary steps I activated the VG and didn't deactivated it then, so my first change was not representing actual start steps...
It seems from the messages that rgmanager tries to start the inner fs before activating its lvm device first....

Mar  3 16:21:56 clutest1 clurgmgrd[2396]: <notice> Starting stopped service service:TESTSRV 
Mar  3 16:21:56 clutest1 clurgmgrd: [2396]: <notice> Activating VG_TEST_APPL/LV_TEST_APPL 
Mar  3 16:21:56 clutest1 clurgmgrd: [2396]: <notice> Making resilient : lvchange -ay VG_TEST_APPL/LV_TEST_APPL 
Mar  3 16:21:56 clutest1 clurgmgrd: [2396]: <notice> Resilient command: lvchange -ay VG_TEST_APPL/LV_TEST_APPL --config devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]} 
Mar  3 16:21:56 clutest1 clurgmgrd: [2396]: <notice> Activating VG_TEST_DATA/LV_TEST_DATA 
Mar  3 16:21:57 clutest1 clurgmgrd: [2396]: <notice> Making resilient : lvchange -ay VG_TEST_DATA/LV_TEST_DATA 
Mar  3 16:21:57 clutest1 clurgmgrd: [2396]: <notice> Resilient command: lvchange -ay VG_TEST_DATA/LV_TEST_DATA --config devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]} 
Mar  3 16:21:57 clutest1 kernel: kjournald starting.  Commit interval 5 seconds
Mar  3 16:21:57 clutest1 kernel: EXT3 FS on dm-0, internal journal
Mar  3 16:21:57 clutest1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar  3 16:21:57 clutest1 kernel: kjournald starting.  Commit interval 5 seconds
Mar  3 16:21:57 clutest1 kernel: EXT3 FS on dm-4, internal journal
Mar  3 16:21:57 clutest1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar  3 16:21:57 clutest1 clurgmgrd: [2396]: <err> startFilesystem: Could not match /dev/VG_TEST_TEMP/LV_TEST_TEMP with a real device 
Mar  3 16:21:57 clutest1 clurgmgrd[2396]: <notice> start on fs "TEST_TEMP" returned 2 (invalid argument(s)) 
Mar  3 16:21:57 clutest1 clurgmgrd[2396]: <warning> #68: Failed to start service:TESTSRV; return value: 1 
Mar  3 16:21:57 clutest1 clurgmgrd[2396]: <notice> Stopping service service:TESTSRV 

This in my opinion is a bug, as the last fs resource (TEST_TEMP) should be started after its same level lvm resource (TEST_TEMP).
Is my assumption shared by anyone else, before opening a bug?

A fix to this seems to be this config:
                <service domain="MAIN" autostart="1" name="TESTSRV">
                        <ip ref="10.4.5.157"/>
                        <lvm ref="TEST_APPL"/>
                        <fs ref="TEST_APPL"/>
                        <lvm ref="TEST_DATA"/>

                        <fs ref="TEST_DATA">
                                <lvm ref="TEST_TEMP">
                                        <fs ref="TEST_TEMP"/>
                                </lvm>
                        </fs>
                        <script ref="clusterssh"/>
                </service>

But in my opinion sort of redundant one.....
With this config enabled  and the service that remained in stopped state, because of the problem above, if I run now
clusvcadm -R TESTSRV
I get success, with this inside the log...

Mar  3 16:40:41 clutest1 clurgmgrd[2396]: <notice> Starting stopped service service:TESTSRV 
Mar  3 16:40:41 clutest1 clurgmgrd: [2396]: <notice> Activating VG_TEST_APPL/LV_TEST_APPL 
Mar  3 16:40:41 clutest1 clurgmgrd: [2396]: <notice> Making resilient : lvchange -ay VG_TEST_APPL/LV_TEST_APPL 
Mar  3 16:40:41 clutest1 clurgmgrd: [2396]: <notice> Resilient command: lvchange -ay VG_TEST_APPL/LV_TEST_APPL --config devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]} 
Mar  3 16:40:41 clutest1 clurgmgrd: [2396]: <notice> Activating VG_TEST_DATA/LV_TEST_DATA 
Mar  3 16:40:41 clutest1 clurgmgrd: [2396]: <notice> Making resilient : lvchange -ay VG_TEST_DATA/LV_TEST_DATA 
Mar  3 16:40:41 clutest1 clurgmgrd: [2396]: <notice> Resilient command: lvchange -ay VG_TEST_DATA/LV_TEST_DATA --config devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]} 
Mar  3 16:40:41 clutest1 kernel: kjournald starting.  Commit interval 5 seconds
Mar  3 16:40:41 clutest1 kernel: EXT3 FS on dm-0, internal journal
Mar  3 16:40:41 clutest1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar  3 16:40:42 clutest1 kernel: kjournald starting.  Commit interval 5 seconds
Mar  3 16:40:42 clutest1 kernel: EXT3 FS on dm-4, internal journal
Mar  3 16:40:42 clutest1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar  3 16:40:42 clutest1 clurgmgrd: [2396]: <notice> Activating VG_TEST_TEMP/LV_TEST_TEMP 
Mar  3 16:40:42 clutest1 clurgmgrd: [2396]: <notice> Making resilient : lvchange -ay VG_TEST_TEMP/LV_TEST_TEMP 
Mar  3 16:40:42 clutest1 clurgmgrd: [2396]: <notice> Resilient command: lvchange -ay VG_TEST_TEMP/LV_TEST_TEMP --config devices{filter=["a|/dev/vda4|","a|/dev/vdc|","a|/dev/vdd|","a|/dev/vde|","r|.*|"]} 
Mar  3 16:40:42 clutest1 kernel: kjournald starting.  Commit interval 5 seconds
Mar  3 16:40:42 clutest1 kernel: EXT3 FS on dm-5, internal journal
Mar  3 16:40:42 clutest1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar  3 16:40:43 clutest1 clurgmgrd[2396]: <notice> Service service:TESTSRV started 
Mar  3 16:40:51 clutest1 clurgmgrd: [2396]: <notice> Getting status 

Any thoughts are appreciated.
Gianluca
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster