One node goes offline, the other node can't see the replicated volume anymore

GregScott at infrasupport.com (Greg Scott) · Mon, 15 Jul 2013 22:23:37 +0000

I think we're making progress.  I put in a sleep 30 in my rc.local, rebooted, and my filesystem is now mounted after my first logon.  

Still some stuff I don't understand in /var/log/messages, but my before and after mounts look much better.   And notice how messages from all kinds of things get mixed in together.  So systemd must fire up a bunch of concurrent threads to do its thing.  And that's why F19 boots so fast.  But the tradeoff is you can't count on things happening in sequence.  

I wonder if I can set up one of those service doo-dad files, where I want glusterd started first and then have it run a script to mount my stuff?  That would be a more deterministic way to do it versus sleeping 30 seconds in rc.local.  I have to go out for a couple of hours.  I'll see what I can put together and report results here.

Jul 15 17:13:18 chicago-fw1 audispd: queue is full - dropping event
Jul 15 17:13:18 chicago-fw1 audispd: queue is full - dropping event
Jul 15 17:13:18 chicago-fw1 audispd: queue is full - dropping event
Jul 15 17:13:20 chicago-fw1 systemd[1]: Started GlusterFS an clustered file-system server.
Jul 15 17:13:22 chicago-fw1 mount[1001]: Mount failed. Please check the log file for more details.
Jul 15 17:13:22 chicago-fw1 systemd[1]: firewall\x2dscripts.mount mount process exited, code=exited status=1
Jul 15 17:13:22 chicago-fw1 systemd[1]: Unit firewall\x2dscripts.mount entered failed state.
.
.
. a bazillion meaningless selinux warnings (because selinux=permissive here)
.
.
Jul 15 17:13:40 chicago-fw1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from name_bind access on the tcp_socket .
 For complete SELinux messages. run sealert -l 221b72d0-d5d8-4a70-bedd-697a6b9e0f03
Jul 15 17:13:40 chicago-fw1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from name_bind access on the tcp_socket .
 For complete SELinux messages. run sealert -l 22b9b899-3fe2-47fc-8c5d-7bd5ed0e1f17
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Mon Jul 15 17:13:40 CDT 2013
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Making sure the Gluster stuff is mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Mounted before mount -av
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Filesystem                       Size  Used Avail Use% Mounted on
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-root           14G  3.9G  8.7G  31% /
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: devtmpfs                         990M     0  990M   0% /dev
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M     0  996M   0% /dev/shm
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M  884K  996M   1% /run
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M     0  996M   0% /sys/fs/cgroup
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M     0  996M   0% /tmp
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda2                        477M   87M  365M  20% /boot
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda1                        200M  9.4M  191M   5% /boot/efi
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-gluster--fw1  7.9G   33M  7.8G   1% /gluster-fw1
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: extra arguments at end (ignored)
Jul 15 17:13:40 chicago-fw1 setroubleshoot: SELinux is preventing /usr/sbin/glusterfsd from name_bind access on the tcp_socket .
 For complete SELinux messages. run sealert -l 225efbe9-0ea3-4f5b-8791-c325d2f0eed6
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /                        : ignored
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /boot                    : already mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /boot/efi                : already mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /gluster-fw1             : already mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: swap                     : ignored
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /firewall-scripts        : successfully mounted
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Mounted after mount -av
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Filesystem                       Size  Used Avail Use% Mounted on
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-root           14G  3.9G  8.7G  31% /
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: devtmpfs                         990M     0  990M   0% /dev
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M     0  996M   0% /dev/shm
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M  884K  996M   1% /run
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M     0  996M   0% /sys/fs/cgroup
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: tmpfs                            996M     0  996M   0% /tmp
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda2                        477M   87M  365M  20% /boot
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/sda1                        200M  9.4M  191M   5% /boot/efi
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: /dev/mapper/fedora-gluster--fw1  7.9G   33M  7.8G   1% /gluster-fw1
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: 192.168.253.1:/firewall-scripts  7.6G   19M  7.2G   1% /firewall-scripts
Jul 15 17:13:40 chicago-fw1 rc.local[1005]: Starting up firewall common items
Jul 15 17:13:40 chicago-fw1 systemd[1]: Started /etc/rc.d/rc.local Compatibility.
Jul 15 17:13:40 chicago-fw1 systemd[1]: Starting Terminate Plymouth Boot Screen...

Greg Scott
Infrasupport Corporation
GregScott at Infrasupport.com

Direct 1-651-260-1051

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Greg Scott
Sent: Monday, July 15, 2013 5:12 PM
To: 'Marcus Bointon'; gluster-users at gluster.org List
Subject: Re: One node goes offline, the other node can't see the replicated volume anymore

And for what it's worth, I just now looked and noticed rc.local does not really run last in the startup sequence anymore.  According to below, it only depends on the network being started.   So I could easily be trying my mounts before gluster ever gets fired up.  

[root at chicago-fw1 system]# pwd
/usr/lib/systemd/system
[root at chicago-fw1 system]# more rc-local.service #  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it #  under the terms of the GNU Lesser General Public License as published by #  the Free Software Foundation; either version 2.1 of the License, or #  (at your option) any later version.

# This unit gets pulled automatically into multi-user.target by # systemd-rc-local-generator if /etc/rc.d/rc.local is executable.
[Unit]
Description=/etc/rc.d/rc.local Compatibility ConditionFileIsExecutable=/etc/rc.d/rc.local
After=network.target

[Service]
Type=forking
ExecStart=/etc/rc.d/rc.local start
TimeoutSec=0
RemainAfterExit=yes
SysVStartPriority=99
[root at chicago-fw1 system]#

- Greg

-----Original Message-----
From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Marcus Bointon
Sent: Monday, July 15, 2013 5:05 PM
To: gluster-users at gluster.org List
Subject: Re: One node goes offline, the other node can't see the replicated volume anymore

On 15 Jul 2013, at 21:22, Greg Scott <GregScott at infrasupport.com> wrote:

> # The fstab mounts happen early in startup, then Gluster starts up later.
> # By now, Gluster should be up and running and the mounts should work.
> # That _netdev option is supposed to account for the delay but doesn't 
> seem # to work right.

It's interesting to see that script - that's what's happens to me with 3.3.0. If I set gluster mounts to mount from fstab with _netdev, it hangs the boot completely and I have to go into single user mode (and edit it out of fstab) to recover, though gluster logs nothing at all. Autofs fails too (though I think that's autofs not understanding mounting an NFS volume from localhost), yet it all works on a manual mount.

Sad to see you're having trouble with 3.4. I hope you can make it work!

Marcus
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users