Yeah, Thanks. I checked your thread...if you ment "clvmd hangs" however It's like not finished... I see only 3 entries for that thread and unfortunately no solution at the end. May I miss something? However my scenario is a bit different, I don't need gfs, but only clvmd with a failover lvm, as this is an active/passive configuration. And my clvmd is rarely hanging, but my main problem that all the volumes remain inactive. On 08/10/2012 07:00 PM, Chip Burke wrote: > See my thread earlier as I am having similar issues. I am testing this > soon, but I "think" the issue in my case is setting up SCSI fencing before > GFS2. So essentially it has nothing to fence off of, sees it as a fault, > and never recovers. I "think" my fix will be establish the LVMs, GFS2 etc > then put in the SCSI fence so that it can actually create the private > reservations. Then the fun begins in pulling the plug randomly to see how > it behaves. > ________________________________________ > Chip Burke > > > > > > > > On 8/10/12 12:46 PM, "Digimer" <lists@xxxxxxxxxx> wrote: > >> Not sure if it relates, but I can say that without fencing, things will >> break in strange ways. The reason is that if anything triggers a fault, >> the cluster blocks by design and stays blocked until a fence call >> succeeds (which is impossible without fencing configured in the first >> place). >> >> Can you please setup fencing, test to make sure it works (using >> 'fence_node rhel2.local' from rhel1.local, then in reverse)? Once this >> is done, test again for your problem. If it still exists, please paste >> the updated cluster.conf then. Also please include syslog from both >> nodes around the time of your LVM tests. >> >> digimer >> >> On 08/10/2012 12:38 PM, Poós Krisztián wrote: >>> This is the cluster conf, Which is a clone of the problematic system on >>> a test environment (without the ORacle and SAP instances, only focusing >>> on this LVM issue, with an LVM resource) >>> >>> [root@rhel2 ~]# cat /etc/cluster/cluster.conf >>> <?xml version="1.0"?> >>> <cluster config_version="7" name="teszt"> >>> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> >>> <clusternodes> >>> <clusternode name="rhel1.local" nodeid="1" votes="1"> >>> <fence/> >>> </clusternode> >>> <clusternode name="rhel2.local" nodeid="2" votes="1"> >>> <fence/> >>> </clusternode> >>> </clusternodes> >>> <cman expected_votes="3"/> >>> <fencedevices/> >>> <rm> >>> <failoverdomains> >>> <failoverdomain name="all" nofailback="1" ordered="1" restricted="0"> >>> <failoverdomainnode name="rhel1.local" priority="1"/> >>> <failoverdomainnode name="rhel2.local" priority="2"/> >>> </failoverdomain> >>> </failoverdomains> >>> <resources> >>> <lvm lv_name="teszt-lv" name="teszt-lv" vg_name="teszt"/> >>> <fs device="/dev/teszt/teszt-lv" fsid="43679" fstype="ext4" >>> mountpoint="/lvm" name="teszt-fs"/> >>> </resources> >>> <service autostart="1" domain="all" exclusive="0" name="teszt" >>> recovery="disable"> >>> <lvm ref="teszt-lv"/> >>> <fs ref="teszt-fs"/> >>> </service> >>> </rm> >>> <quorumd label="qdisk"/> >>> </cluster> >>> >>> Here are the log parts: >>> Aug 10 17:21:21 rgmanager I am node #2 >>> Aug 10 17:21:22 rgmanager Resource Group Manager Starting >>> Aug 10 17:21:22 rgmanager Loading Service Data >>> Aug 10 17:21:29 rgmanager Initializing Services >>> Aug 10 17:21:31 rgmanager /dev/dm-2 is not mounted >>> Aug 10 17:21:31 rgmanager Services Initialized >>> Aug 10 17:21:31 rgmanager State change: Local UP >>> Aug 10 17:21:31 rgmanager State change: rhel1.local UP >>> Aug 10 17:23:23 rgmanager Starting stopped service service:teszt >>> Aug 10 17:23:25 rgmanager Failed to activate logical volume, >>> teszt/teszt-lv >>> Aug 10 17:23:25 rgmanager Attempting cleanup of teszt/teszt-lv >>> Aug 10 17:23:29 rgmanager Failed second attempt to activate >>> teszt/teszt-lv >>> Aug 10 17:23:29 rgmanager start on lvm "teszt-lv" returned 1 (generic >>> error) >>> Aug 10 17:23:29 rgmanager #68: Failed to start service:teszt; return >>> value: 1 >>> Aug 10 17:23:29 rgmanager Stopping service service:teszt >>> Aug 10 17:23:30 rgmanager stop: Could not match /dev/teszt/teszt-lv with >>> a real device >>> Aug 10 17:23:30 rgmanager stop on fs "teszt-fs" returned 2 (invalid >>> argument(s)) >>> Aug 10 17:23:31 rgmanager #12: RG service:teszt failed to stop; >>> intervention required >>> Aug 10 17:23:31 rgmanager Service service:teszt is failed >>> Aug 10 17:24:09 rgmanager #43: Service service:teszt has failed; can not >>> start. >>> Aug 10 17:24:09 rgmanager #13: Service service:teszt failed to stop >>> cleanly >>> Aug 10 17:25:12 rgmanager Starting stopped service service:teszt >>> Aug 10 17:25:14 rgmanager Failed to activate logical volume, >>> teszt/teszt-lv >>> Aug 10 17:25:15 rgmanager Attempting cleanup of teszt/teszt-lv >>> Aug 10 17:25:17 rgmanager Failed second attempt to activate >>> teszt/teszt-lv >>> Aug 10 17:25:18 rgmanager start on lvm "teszt-lv" returned 1 (generic >>> error) >>> Aug 10 17:25:18 rgmanager #68: Failed to start service:teszt; return >>> value: 1 >>> Aug 10 17:25:18 rgmanager Stopping service service:teszt >>> Aug 10 17:25:19 rgmanager stop: Could not match /dev/teszt/teszt-lv with >>> a real device >>> Aug 10 17:25:19 rgmanager stop on fs "teszt-fs" returned 2 (invalid >>> argument(s)) >>> >>> >>> After I manually started the lvm on node1 and tried to switch it on >>> node2 it's not able to start it. >>> >>> Regards, >>> Krisztian >>> >>> >>> On 08/10/2012 05:15 PM, Digimer wrote: >>>> On 08/10/2012 11:07 AM, Poós Krisztián wrote: >>>>> Dear all, >>>>> >>>>> I hope that anyone run into this problem in the past, so maybe can >>>>> help >>>>> me resolving this issue. >>>>> >>>>> There is a 2 node rhel cluster with quorum also. >>>>> There are clustered lvms, where the -c- flag is on. >>>>> If I start clvmd all the clustered lvms became online. >>>>> >>>>> After this if I start rgmanager, it deactivates all the volumes, and >>>>> not >>>>> able to activate them anymore as there are no such devices anymore >>>>> during the startup of the service, so after this, the service fails. >>>>> All lvs remain without the active flag. >>>>> >>>>> I can manually bring it up, but only if after clvmd is started, I set >>>>> the lvms manually offline by the lvchange -an <lv> >>>>> After this, when I start rgmanager, it can take it online without >>>>> problems. However I think, this action should be done by the rgmanager >>>>> itself. All the logs is full with the next: >>>>> rgmanager Making resilient: lvchange -an .... >>>>> rgmanager lv_exec_resilient failed >>>>> rgmanager lv_activate_resilient stop failed on .... >>>>> >>>>> As well, sometimes the lvs/clvmd commands are also hanging. I have to >>>>> restart clvmd to make it work again. (sometimes killing it) >>>>> >>>>> Anyone has any idea, what to check? >>>>> >>>>> Thanks and regards, >>>>> Krisztian >>>> >>>> Please paste your cluster.conf file with minimal edits. >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.com >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster