On Fri, 2007-11-02 at 12:32 -0500, Ryan O'Hara wrote: > Christopher Barry wrote: > > On Wed, 2007-10-31 at 10:44 -0500, Ryan O'Hara wrote: > >> Christopher Barry wrote: > >>> Greetings all, > >>> > >>> I have 2 vmware esx servers, each hitting a NetApp over FS, and each > >>> with 3 RHCS cluster nodes trying to mount a gfs volume. > >>> > >>> All of the nodes (1,2,& 3) on esx-01 can mount the volume fine, but none > >>> of the nodes in the second esx box can mount the gfs volume at all, and > >>> I get the following error in dmesg: > >> Are you intentionally trying to use scsi reservations as a fence method? > > > > No. In fact I thought the scsi_reservation service may be *causing* the > > issue, and disabled the service from starting on all nodes. Does this > > have to be on? > > No. You only need to run this service if you plan on using scsi > reservations as a fence method. A scsi reservation will restrict access > to a device such that only registered nodes can access it. If a > reservation exist and a unregistered node tries to access the device, > you'll see what you are seeing. > > It may be that some reservations were created and never got cleaned-up, > which might cause the problem to continue even after the scsi_reserve > script was disabled. You can manually run '/etc/init.d/scsi_reserve > stop' to attempt to clean up any reservations. Note that I am assuming > that any reservations that might still exist on a device were created by > the scsi_reserve script. If that is the case, you can see what devices a > node is registered for by doing a '/etc/init.d/scsi_reserve status'. > Also not that the scsi_reserve script does *not* have to but started or > enabled to do these things (ie. you can safely run 'status' or 'stop' > without first running 'start'). > > On caveat... 'scsi_reserve stop' will not unregister a node if it is the > reservation holder and other nodes are still registered with a device. > You can also use sg_persist command directly to clean all registrations > and reservations. Use the -C option. See the sg_persist man page for a > better description. > Okay. I had some other issues to deal with, but now I'm back to this, and let me get you all up to speed on what I have done, and what I do not understand about all of this. status: esx-01: contains nodes 1 thru 3 esx-02: contains nodes 4 thru 6 esx-01: all 3 cluster nodes can mount gfs. esx-02: none can mount gfs. esx-02: scsi reservation errors in dmesg esx-02: mount fails w/ "can't read superblock" Oddly, with the gfs filesystem unmounted on all nodes, I can format the gfs filesystem from the esx-02 box (from node4), and then mount it from a node on esx-01, but cannot mount it on the node I just formatted it from! fdisk -l shows /dev/sdc1 on nodes 4 thru 6 just fine. # sg_persist -C --out /dev/sdc1 fails to clear out the reservations I do not understand these reservations, maybe someone can summarize? I'm not at the box this sec (vpn-ing in will hork my evolution), but I will provide any amount of data if either you Ryan, or anyone else has stuff for me to try. Thanks all, -C > >> It sounds like the nodes on esx-01 are creating reservations, but the > >> nodes on the second esx box are not registering with the device and > >> therefore are unable to mount the filesystem. Creation of reservations > >> and registrations is handled by the scsi_reserve init script, which > >> should be run at startup on all nodes in the cluster. You can check to > >> see what devices a node is registered for before you mount the > >> filesystem by doing /etc/init.d/scsi_reservce status. If your nodes are > >> not registered with the device and a reservation exists then you won't > >> be able to mount. > >> > >>> Lock_Harness 2.6.9-72.2 (built Apr 24 2007 12:45:38) installed > >>> GFS 2.6.9-72.2 (built Apr 24 2007 12:45:54) installed > >>> GFS: Trying to join cluster "lock_dlm", "kop-sds:gfs_home" > >>> Lock_DLM (built Apr 24 2007 12:45:40) installed > >>> GFS: fsid=kop-sds:gfs_home.2: Joined cluster. Now mounting FS... > >>> GFS: fsid=kop-sds:gfs_home.2: jid=2: Trying to acquire journal lock... > >>> GFS: fsid=kop-sds:gfs_home.2: jid=2: Looking at journal... > >>> GFS: fsid=kop-sds:gfs_home.2: jid=2: Done > >>> scsi2 (0,0,0) : reservation conflict > >>> SCSI error : <2 0 0 0> return code = 0x18 > >>> end_request: I/O error, dev sdc, sector 523720263 > >>> scsi2 (0,0,0) : reservation conflict > >>> SCSI error : <2 0 0 0> return code = 0x18 > >>> end_request: I/O error, dev sdc, sector 523720271 > >>> scsi2 (0,0,0) : reservation conflict > >>> SCSI error : <2 0 0 0> return code = 0x18 > >>> end_request: I/O error, dev sdc, sector 523720279 > >>> GFS: fsid=kop-sds:gfs_home.2: fatal: I/O error > >>> GFS: fsid=kop-sds:gfs_home.2: block = 65464979 > >>> GFS: fsid=kop-sds:gfs_home.2: function = gfs_logbh_wait > >>> GFS: fsid=kop-sds:gfs_home.2: file > >>> = /builddir/build/BUILD/gfs-kernel-2.6.9-72/smp/src/gfs/dio.c, line = > >>> 923 > >>> GFS: fsid=kop-sds:gfs_home.2: time = 1193838678 > >>> GFS: fsid=kop-sds:gfs_home.2: about to withdraw from the cluster > >>> GFS: fsid=kop-sds:gfs_home.2: waiting for outstanding I/O > >>> GFS: fsid=kop-sds:gfs_home.2: telling LM to withdraw > >>> lock_dlm: withdraw abandoned memory > >>> GFS: fsid=kop-sds:gfs_home.2: withdrawn > >>> GFS: fsid=kop-sds:gfs_home.2: can't get resource index inode: -5 > >>> > >>> > >>> Does anyone have a clue as to where I should start looking? > >>> > >>> > >>> Thanks, > >>> -C > >>> > >>> -- > >>> Linux-cluster mailing list > >>> Linux-cluster@xxxxxxxxxx > >>> https://www.redhat.com/mailman/listinfo/linux-cluster > >> -- > >> Linux-cluster mailing list > >> Linux-cluster@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Regards, -C Christopher Barry Systems Engineer, Principal QLogic Corporation 780 Fifth Avenue, Suite 140 King of Prussia, PA 19406 o/f: 610-233-4870 / 4777 m: 267-242-9306 -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster