Re: Gluster and NFS-Ganesha - cluster is down after reboot

Adam Ru <ad.ruckel@xxxxxxxxx> · Sun, 28 May 2017 15:27:40 +0100

Bug filed: https://bugzilla.redhat.com/show_bug.cgi?id=1456265
(as mentioned in previous e-mail)

Thank you for your support!

Kind regards,

Adam

On Sun, May 28, 2017 at 2:37 PM, Adam Ru <ad.ruckel@xxxxxxxxx> wrote:
> Hi Soumya,
>
> again I apologize for delay in response. I'll try to file a bug.
> Meantime I'm sending AVCs and version number. AVC are collected
> between two reboots, in both cases I manually started
> nfs-ganesha.service and nfs-ganesha-lock.service failed to start.
>
>
> uname -r
>
> 3.10.0-514.21.1.el7.x86_64
>
>
>
> sestatus -v
>
> SELinux status:                 enabled
> SELinuxfs mount:                /sys/fs/selinux
> SELinux root directory:         /etc/selinux
> Loaded policy name:             targeted
> Current mode:                   enforcing
> Mode from config file:          enforcing
> Policy MLS status:              enabled
> Policy deny_unknown status:     allowed
> Max kernel policy version:      28
>
> Process contexts:
> Current context:
> unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> Init context:                   system_u:system_r:init_t:s0
>
> File contexts:
> Controlling terminal:           unconfined_u:object_r:user_tty_device_t:s0
> /etc/passwd                     system_u:object_r:passwd_file_t:s0
> /etc/shadow                     system_u:object_r:shadow_t:s0
> /bin/bash                       system_u:object_r:shell_exec_t:s0
> /bin/login                      system_u:object_r:login_exec_t:s0
> /bin/sh                         system_u:object_r:bin_t:s0 ->
> system_u:object_r:shell_exec_t:s0
> /sbin/agetty                    system_u:object_r:getty_exec_t:s0
> /sbin/init                      system_u:object_r:bin_t:s0 ->
> system_u:object_r:init_exec_t:s0
> /usr/sbin/sshd                  system_u:object_r:sshd_exec_t:s0
>
>
>
> sudo systemctl start nfs-ganesha.service
>
> systemctl status -l nfs-ganesha-lock.service
>
> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
> static; vendor preset: disabled)
>    Active: failed (Result: exit-code) since Sun 2017-05-28 14:12:48 UTC; 9s ago
>   Process: 1991 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
> (code=exited, status=1/FAILURE)
>
> mynode0.localdomain systemd[1]: Starting NFS status monitor for
> NFSv2/3 locking....
> mynode0.localdomain rpc.statd[1992]: Version 1.3.0 starting
> mynode0.localdomain rpc.statd[1992]: Flags: TI-RPC
> mynode0.localdomain rpc.statd[1992]: Failed to open directory sm:
> Permission denied
> mynode0.localdomain systemd[1]: nfs-ganesha-lock.service: control
> process exited, code=exited status=1
> mynode0.localdomain systemd[1]: Failed to start NFS status monitor for
> NFSv2/3 locking..
> mynode0.localdomain systemd[1]: Unit nfs-ganesha-lock.service entered
> failed state.
> mynode0.localdomain systemd[1]: nfs-ganesha-lock.service failed.
>
>
>
> sudo ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -i
>
> ----
> type=SYSCALL msg=audit(05/28/2017 14:04:32.160:25) : arch=x86_64
> syscall=bind success=yes exit=0 a0=0xf a1=0x7ffc757feb60 a2=0x10
> a3=0x22 items=0 ppid=1149 pid=1157 auid=unset uid=root gid=root
> euid=root suid=root fsuid=root egid=root sgid=root fsgid=root
> tty=(none) ses=unset comm=glusterd exe=/usr/sbin/glusterfsd
> subj=system_u:system_r:glusterd_t:s0 key=(null)
> type=AVC msg=audit(05/28/2017 14:04:32.160:25) : avc:  denied  {
> name_bind } for  pid=1157 comm=glusterd src=61000
> scontext=system_u:system_r:glusterd_t:s0
> tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
> ----
> type=SYSCALL msg=audit(05/28/2017 14:11:16.141:26) : arch=x86_64
> syscall=bind success=no exit=EACCES(Permission denied) a0=0xf
> a1=0x7ffffbf92620 a2=0x10 a3=0x22 items=0 ppid=1139 pid=1146
> auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
> sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
> exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
> key=(null)
> type=AVC msg=audit(05/28/2017 14:11:16.141:26) : avc:  denied  {
> name_bind } for  pid=1146 comm=glusterd src=61000
> scontext=system_u:system_r:glusterd_t:s0
> tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
> ----
> type=SYSCALL msg=audit(05/28/2017 14:12:48.068:75) : arch=x86_64
> syscall=openat success=no exit=EACCES(Permission denied)
> a0=0xffffffffffffff9c a1=0x7efdc1ec3e10
> a2=O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC a3=0x0 items=0 ppid=1991
> pid=1992 auid=unset uid=root gid=root euid=root suid=root fsuid=root
> egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.statd
> exe=/usr/sbin/rpc.statd subj=system_u:system_r:rpcd_t:s0 key=(null)
> type=AVC msg=audit(05/28/2017 14:12:48.068:75) : avc:  denied  { read
> } for  pid=1992 comm=rpc.statd name=sm dev="fuse"
> ino=12866274077597183313 scontext=system_u:system_r:rpcd_t:s0
> tcontext=system_u:object_r:fusefs_t:s0 tclass=dir
> ----
> type=SYSCALL msg=audit(05/28/2017 14:12:48.080:76) : arch=x86_64
> syscall=open success=no exit=EACCES(Permission denied)
> a0=0x7efdc1ec3dd0 a1=O_RDONLY a2=0x7efdc1ec3de8 a3=0x5 items=0
> ppid=1991 pid=1992 auid=unset uid=root gid=root euid=root suid=root
> fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset
> comm=rpc.statd exe=/usr/sbin/rpc.statd
> subj=system_u:system_r:rpcd_t:s0 key=(null)
> type=AVC msg=audit(05/28/2017 14:12:48.080:76) : avc:  denied  { read
> } for  pid=1992 comm=rpc.statd name=state dev="fuse"
> ino=12362789396445498341 scontext=system_u:system_r:rpcd_t:s0
> tcontext=system_u:object_r:fusefs_t:s0 tclass=file
> ----
> type=SYSCALL msg=audit(05/28/2017 14:17:37.177:26) : arch=x86_64
> syscall=bind success=no exit=EACCES(Permission denied) a0=0xf
> a1=0x7ffdfa768c70 a2=0x10 a3=0x22 items=0 ppid=1155 pid=1162
> auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
> sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
> exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
> key=(null)
> type=AVC msg=audit(05/28/2017 14:17:37.177:26) : avc:  denied  {
> name_bind } for  pid=1162 comm=glusterd src=61000
> scontext=system_u:system_r:glusterd_t:s0
> tcontext=system_u:object_r:ephemeral_port_t:s0 tclass=tcp_socket
> ----
> type=SYSCALL msg=audit(05/28/2017 14:17:46.401:56) : arch=x86_64
> syscall=kill success=no exit=EACCES(Permission denied) a0=0x560
> a1=SIGKILL a2=0x7fd684000078 a3=0x0 items=0 ppid=1 pid=1167 auid=unset
> uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root
> fsgid=root tty=(none) ses=unset comm=glusterd exe=/usr/sbin/glusterfsd
> subj=system_u:system_r:glusterd_t:s0 key=(null)
> type=AVC msg=audit(05/28/2017 14:17:46.401:56) : avc:  denied  {
> sigkill } for  pid=1167 comm=glusterd
> scontext=system_u:system_r:glusterd_t:s0
> tcontext=system_u:system_r:cluster_t:s0 tclass=process
> ----
> type=SYSCALL msg=audit(05/28/2017 14:17:45.400:55) : arch=x86_64
> syscall=kill success=no exit=EACCES(Permission denied) a0=0x560
> a1=SIGTERM a2=0x7fd684000038 a3=0x99 items=0 ppid=1 pid=1167
> auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root
> sgid=root fsgid=root tty=(none) ses=unset comm=glusterd
> exe=/usr/sbin/glusterfsd subj=system_u:system_r:glusterd_t:s0
> key=(null)
> type=AVC msg=audit(05/28/2017 14:17:45.400:55) : avc:  denied  {
> signal } for  pid=1167 comm=glusterd
> scontext=system_u:system_r:glusterd_t:s0
> tcontext=system_u:system_r:cluster_t:s0 tclass=process
> ----
> type=SYSCALL msg=audit(05/28/2017 14:18:56.024:67) : arch=x86_64
> syscall=openat success=no exit=EACCES(Permission denied)
> a0=0xffffffffffffff9c a1=0x7ff662e9be10
> a2=O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC a3=0x0 items=0 ppid=1949
> pid=1950 auid=unset uid=root gid=root euid=root suid=root fsuid=root
> egid=root sgid=root fsgid=root tty=(none) ses=unset comm=rpc.statd
> exe=/usr/sbin/rpc.statd subj=system_u:system_r:rpcd_t:s0 key=(null)
> type=AVC msg=audit(05/28/2017 14:18:56.024:67) : avc:  denied  { read
> } for  pid=1950 comm=rpc.statd name=sm dev="fuse"
> ino=12866274077597183313 scontext=system_u:system_r:rpcd_t:s0
> tcontext=system_u:object_r:fusefs_t:s0 tclass=dir
> ----
> type=SYSCALL msg=audit(05/28/2017 14:18:56.034:68) : arch=x86_64
> syscall=open success=no exit=EACCES(Permission denied)
> a0=0x7ff662e9bdd0 a1=O_RDONLY a2=0x7ff662e9bde8 a3=0x5 items=0
> ppid=1949 pid=1950 auid=unset uid=root gid=root euid=root suid=root
> fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset
> comm=rpc.statd exe=/usr/sbin/rpc.statd
> subj=system_u:system_r:rpcd_t:s0 key=(null)
> type=AVC msg=audit(05/28/2017 14:18:56.034:68) : avc:  denied  { read
> } for  pid=1950 comm=rpc.statd name=state dev="fuse"
> ino=12362789396445498341 scontext=system_u:system_r:rpcd_t:s0
> tcontext=system_u:object_r:fusefs_t:s0 tclass=file
>
>
>
>
> On Mon, May 15, 2017 at 11:56 AM, Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
>>
>>
>> On 05/12/2017 06:27 PM, Adam Ru wrote:
>>>
>>> Hi Soumya,
>>>
>>> Thank you very much for last response – very useful.
>>>
>>> I apologize for delay, I had to find time for another testing.
>>>
>>> I updated instructions that I provided in previous e-mail. *** means
>>> that the step was added.
>>>
>>> Instructions:
>>>  - Clean installation of CentOS 7.3 with all updates, 3x node,
>>> resolvable IPs and VIPs
>>>  - Stopped firewalld (just for testing)
>>>  - *** SELinux in permissive mode (I had to, will explain bellow)
>>>  - Install "centos-release-gluster" to get "centos-gluster310" repo
>>> and install following (nothing else):
>>>  --- glusterfs-server
>>>  --- glusterfs-ganesha
>>>  - Passwordless SSH between all nodes
>>> (/var/lib/glusterd/nfs/secret.pem and secret.pem.pub on all nodes)
>>>  - systemctl enable and start glusterd
>>>  - gluster peer probe <other nodes>
>>>  - gluster volume set all cluster.enable-shared-storage enable
>>>  - systemctl enable and start pcsd.service
>>>  - systemctl enable pacemaker.service (cannot be started at this moment)
>>>  - Set password for hacluster user on all nodes
>>>  - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla
>>>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>>> sure if needed)
>>>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>>> insert configuration
>>>  - Try list files on other nodes: ls
>>> /var/run/gluster/shared_storage/nfs-ganesha/
>>>  - gluster nfs-ganesha enable
>>>  - *** systemctl enable pacemaker.service (again, since pacemaker was
>>> disabled at this point)
>>>  - *** Check owner of "state", "statd", "sm" and "sm.bak" in
>>> /var/lib/nfs/ (I had to: chown rpcuser:rpcuser
>>> /var/lib/nfs/statd/state)
>>>  - Check on other nodes that nfs-ganesha.service is running and "pcs
>>> status" shows started resources
>>>  - gluster volume create mynewshare replica 3 transport tcp
>>> node1:/<dir> node2:/<dir> node3:/<dir>
>>>  - gluster volume start mynewshare
>>>  - gluster vol set mynewshare ganesha.enable on
>>>
>>> At this moment, this is status of important (I think) services:
>>>
>>> -- corosync.service             disabled
>>> -- corosync-notifyd.service     disabled
>>> -- glusterd.service             enabled
>>> -- glusterfsd.service           disabled
>>> -- pacemaker.service            enabled
>>> -- pcsd.service                 enabled
>>> -- nfs-ganesha.service          disabled
>>> -- nfs-ganesha-config.service   static
>>> -- nfs-ganesha-lock.service     static
>>>
>>> -- corosync.service             active (running)
>>> -- corosync-notifyd.service     inactive (dead)
>>> -- glusterd.service             active (running)
>>> -- glusterfsd.service           inactive (dead)
>>> -- pacemaker.service            active (running)
>>> -- pcsd.service                 active (running)
>>> -- nfs-ganesha.service          active (running)
>>> -- nfs-ganesha-config.service   inactive (dead)
>>> -- nfs-ganesha-lock.service     active (running)
>>>
>>> May I ask you a few questions please?
>>>
>>> 1. Could you please confirm that services above has correct status/state?
>>
>>
>> Looks good to the best of my knowledge.
>>
>>>
>>> 2. When I restart a node then nfs-ganesha is not running. Of course I
>>> cannot enable it since it needs to be enabled after shared storage is
>>> mounted. What is best practice to start it automatically so I don’t
>>> have to worry about restarting node? Should I create a script that
>>> will check whether shared storage was mounted and then start
>>> nfs-ganesha? How do you do this in production?
>>
>>
>> That's right.. We have plans to address this in near future (probably by
>> having a new .service which mounts shared_storage before starting
>> nfs-ganesha). But until then ..yes having a custom defined script to do so
>> is the only way to automate it.
>>
>>
>>>
>>> 3. SELinux is an issue, is that a known bug?
>>>
>>> When I restart a node and start nfs-ganesha.service with SELinux in
>>> permissive mode:
>>>
>>> sudo grep 'statd' /var/log/messages
>>> May 12 12:05:46 mynode1 rpc.statd[2415]: Version 1.3.0 starting
>>> May 12 12:05:46 mynode1 rpc.statd[2415]: Flags: TI-RPC
>>> May 12 12:05:46 mynode1 rpc.statd[2415]: Failed to read
>>> /var/lib/nfs/statd/state: Success
>>> May 12 12:05:46 mynode1 rpc.statd[2415]: Initializing NSM state
>>> May 12 12:05:52 mynode1 rpc.statd[2415]: Received SM_UNMON_ALL request
>>> from mynode1.localdomain while not monitoring any hosts
>>>
>>> systemctl status nfs-ganesha-lock.service --full
>>> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>>> static; vendor preset: disabled)
>>>    Active: active (running) since Fri 2017-05-12 12:05:46 UTC; 1min 43s
>>> ago
>>>   Process: 2414 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>>> (code=exited, status=0/SUCCESS)
>>>  Main PID: 2415 (rpc.statd)
>>>    CGroup: /system.slice/nfs-ganesha-lock.service
>>>            └─2415 /usr/sbin/rpc.statd --no-notify
>>>
>>> May 12 12:05:46 mynode1.localdomain systemd[1]: Starting NFS status
>>> monitor for NFSv2/3 locking....
>>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Version 1.3.0
>>> starting
>>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Flags: TI-RPC
>>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Failed to read
>>> /var/lib/nfs/statd/state: Success
>>> May 12 12:05:46 mynode1.localdomain rpc.statd[2415]: Initializing NSM
>>> state
>>> May 12 12:05:46 mynode1.localdomain systemd[1]: Started NFS status
>>> monitor for NFSv2/3 locking..
>>> May 12 12:05:52 mynode1.localdomain rpc.statd[2415]: Received
>>> SM_UNMON_ALL request from mynode1.localdomain while not monitoring any
>>> hosts
>>>
>>>
>>> When I restart a node and start nfs-ganesha.service with SELinux in
>>> enforcing mode:
>>>
>>>
>>> sudo grep 'statd' /var/log/messages
>>> May 12 12:14:01 mynode1 rpc.statd[1743]: Version 1.3.0 starting
>>> May 12 12:14:01 mynode1 rpc.statd[1743]: Flags: TI-RPC
>>> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open directory sm:
>>> Permission denied
>>> May 12 12:14:01 mynode1 rpc.statd[1743]: Failed to open
>>> /var/lib/nfs/statd/state: Permission denied
>>>
>>> systemctl status nfs-ganesha-lock.service --full
>>> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>>> static; vendor preset: disabled)
>>>    Active: failed (Result: exit-code) since Fri 2017-05-12 12:14:01
>>> UTC; 1min 21s ago
>>>   Process: 1742 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>>> (code=exited, status=1/FAILURE)
>>>
>>> May 12 12:14:01 mynode1.localdomain systemd[1]: Starting NFS status
>>> monitor for NFSv2/3 locking....
>>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Version 1.3.0
>>> starting
>>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Flags: TI-RPC
>>> May 12 12:14:01 mynode1.localdomain rpc.statd[1743]: Failed to open
>>> directory sm: Permission denied
>>> May 12 12:14:01 mynode1.localdomain systemd[1]:
>>> nfs-ganesha-lock.service: control process exited, code=exited status=1
>>> May 12 12:14:01 mynode1.localdomain systemd[1]: Failed to start NFS
>>> status monitor for NFSv2/3 locking..
>>> May 12 12:14:01 mynode1.localdomain systemd[1]: Unit
>>> nfs-ganesha-lock.service entered failed state.
>>> May 12 12:14:01 mynode1.localdomain systemd[1]: nfs-ganesha-lock.service
>>> failed.
>>
>>
>> Cant remember right now. Could you please paste the AVCs you get, and
>> se-linux packages version. Or preferably please file a bug. We can get the
>> details verified from selinux members.
>>
>> Thanks,
>> Soumya
>>
>>
>>>
>>> On Fri, May 5, 2017 at 8:10 PM, Soumya Koduri <skoduri@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> On 05/05/2017 08:04 PM, Adam Ru wrote:
>>>>>
>>>>>
>>>>> Hi Soumya,
>>>>>
>>>>> Thank you for the answer.
>>>>>
>>>>> Enabling Pacemaker? Yes, you’re completely right, I didn’t do it. Thank
>>>>> you.
>>>>>
>>>>> I spent some time by testing and I have some results. This is what I
>>>>> did:
>>>>>
>>>>>  - Clean installation of CentOS 7.3 with all updates, 3x node,
>>>>> resolvable IPs and VIPs
>>>>>  - Stopped firewalld (just for testing)
>>>>>  - Install "centos-release-gluster" to get "centos-gluster310" repo and
>>>>> install following (nothing else):
>>>>>  --- glusterfs-server
>>>>>  --- glusterfs-ganesha
>>>>>  - Passwordless SSH between all nodes (/var/lib/glusterd/nfs/secret.pem
>>>>> and secret.pem.pub on all nodes)
>>>>>  - systemctl enable and start glusterd
>>>>>  - gluster peer probe <other nodes>
>>>>>  - gluster volume set all cluster.enable-shared-storage enable
>>>>>  - systemctl enable and start pcsd.service
>>>>>  - systemctl enable pacemaker.service (cannot be started at this moment)
>>>>>  - Set password for hacluster user on all nodes
>>>>>  - pcs cluster auth <node 1> <node 2> <node 3> -u hacluster -p blabla
>>>>>  - mkdir /var/run/gluster/shared_storage/nfs-ganesha/
>>>>>  - touch /var/run/gluster/shared_storage/nfs-ganesha/ganesha.conf (not
>>>>> sure if needed)
>>>>>  - vi /var/run/gluster/shared_storage/nfs-ganesha/ganesha-ha.conf and
>>>>> insert configuration
>>>>>  - Try list files on other nodes: ls
>>>>> /var/run/gluster/shared_storage/nfs-ganesha/
>>>>>  - gluster nfs-ganesha enable
>>>>>  - Check on other nodes that nfs-ganesha.service is running and "pcs
>>>>> status" shows started resources
>>>>>  - gluster volume create mynewshare replica 3 transport tcp node1:/<dir>
>>>>> node2:/<dir> node3:/<dir>
>>>>>  - gluster volume start mynewshare
>>>>>  - gluster vol set mynewshare ganesha.enable on
>>>>>
>>>>> After these steps, all VIPs are pingable and I can mount
>>>>> node1:/mynewshare
>>>>>
>>>>> Funny thing is that pacemaker.service is disabled again (something
>>>>> disabled it). This is status of important (I think) services:
>>>>
>>>>
>>>>
>>>> yeah. We too had observed this recently. We guess probably pcs cluster
>>>> setup
>>>> command first destroys existing cluster (if any) which may be disabling
>>>> pacemaker too.
>>>>
>>>>>
>>>>> systemctl list-units --all
>>>>> # corosync.service             loaded    active   running
>>>>> # glusterd.service             loaded    active   running
>>>>> # nfs-config.service           loaded    inactive dead
>>>>> # nfs-ganesha-config.service   loaded    inactive dead
>>>>> # nfs-ganesha-lock.service     loaded    active   running
>>>>> # nfs-ganesha.service          loaded    active   running
>>>>> # nfs-idmapd.service           loaded    inactive dead
>>>>> # nfs-mountd.service           loaded    inactive dead
>>>>> # nfs-server.service           loaded    inactive dead
>>>>> # nfs-utils.service            loaded    inactive dead
>>>>> # pacemaker.service            loaded    active   running
>>>>> # pcsd.service                 loaded    active   running
>>>>>
>>>>> systemctl list-unit-files --all
>>>>> # corosync-notifyd.service    disabled
>>>>> # corosync.service            disabled
>>>>> # glusterd.service            enabled
>>>>> # glusterfsd.service          disabled
>>>>> # nfs-blkmap.service          disabled
>>>>> # nfs-config.service          static
>>>>> # nfs-ganesha-config.service  static
>>>>> # nfs-ganesha-lock.service    static
>>>>> # nfs-ganesha.service         disabled
>>>>> # nfs-idmap.service           static
>>>>> # nfs-idmapd.service          static
>>>>> # nfs-lock.service            static
>>>>> # nfs-mountd.service          static
>>>>> # nfs-rquotad.service         disabled
>>>>> # nfs-secure-server.service   static
>>>>> # nfs-secure.service          static
>>>>> # nfs-server.service          disabled
>>>>> # nfs-utils.service           static
>>>>> # nfs.service                 disabled
>>>>> # nfslock.service             static
>>>>> # pacemaker.service           disabled
>>>>> # pcsd.service                enabled
>>>>>
>>>>> I enabled pacemaker again on all nodes and restart all nodes one by one.
>>>>>
>>>>> After reboot all VIPs are gone and I can see that nfs-ganesha.service
>>>>> isn’t running. When I start it on at least two nodes then VIPs are
>>>>> pingable again and I can mount NFS again. But there is still some issue
>>>>> in the setup because when I check nfs-ganesha-lock.service I get:
>>>>>
>>>>> systemctl -l status nfs-ganesha-lock.service
>>>>> ● nfs-ganesha-lock.service - NFS status monitor for NFSv2/3 locking.
>>>>>    Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha-lock.service;
>>>>> static; vendor preset: disabled)
>>>>>    Active: failed (Result: exit-code) since Fri 2017-05-05 13:43:37 UTC;
>>>>> 31min ago
>>>>>   Process: 6203 ExecStart=/usr/sbin/rpc.statd --no-notify $STATDARGS
>>>>> (code=exited, status=1/FAILURE)
>>>>>
>>>>> May 05 13:43:37 node0.localdomain systemd[1]: Starting NFS status
>>>>> monitor for NFSv2/3 locking....
>>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Version 1.3.0
>>>>> starting
>>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Flags: TI-RPC
>>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>>>>> directory sm: Permission denied
>>>>
>>>>
>>>>
>>>> Okay this issue was fixed and the fix should be present in 3.10 too -
>>>>    https://review.gluster.org/#/c/16433/
>>>>
>>>> Please check '/var/log/messages' for statd related errors and cross-check
>>>> permissions of that directory. You could manually chown owner:group of
>>>> /var/lib/nfs/statd/sm directory for now and then restart nfs-ganesha*
>>>> services.
>>>>
>>>> Thanks,
>>>> Soumya
>>>>
>>>>> May 05 13:43:37 node0.localdomain rpc.statd[6205]: Failed to open
>>>>> /var/lib/nfs/statd/state: Permission denied
>>>>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service:
>>>>> control process exited, code=exited status=1
>>>>> May 05 13:43:37 node0.localdomain systemd[1]: Failed to start NFS status
>>>>> monitor for NFSv2/3 locking..
>>>>> May 05 13:43:37 node0.localdomain systemd[1]: Unit
>>>>> nfs-ganesha-lock.service entered failed state.
>>>>> May 05 13:43:37 node0.localdomain systemd[1]: nfs-ganesha-lock.service
>>>>> failed.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Adam
>>>>>
>>>>> On Wed, May 3, 2017 at 10:32 AM, Mahdi Adnan <mahdi.adnan@xxxxxxxxxxx
>>>>> <mailto:mahdi.adnan@xxxxxxxxxxx>> wrote:
>>>>>
>>>>>     Hi,
>>>>>
>>>>>
>>>>>     Same here, when i reboot the node i have to manually execute "pcs
>>>>>     cluster start gluster01" and pcsd already enabled and started.
>>>>>
>>>>>     Gluster 3.8.11
>>>>>
>>>>>     Centos 7.3 latest
>>>>>
>>>>>     Installed using CentOS Storage SIG repository
>>>>>
>>>>>
>>>>>
>>>>>     --
>>>>>
>>>>>     Respectfully*
>>>>>     **Mahdi A. Mahdi*
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>     *From:* gluster-users-bounces@xxxxxxxxxxx
>>>>>     <mailto:gluster-users-bounces@xxxxxxxxxxx>
>>>>>     <gluster-users-bounces@xxxxxxxxxxx
>>>>>     <mailto:gluster-users-bounces@xxxxxxxxxxx>> on behalf of Adam Ru
>>>>>     <ad.ruckel@xxxxxxxxx <mailto:ad.ruckel@xxxxxxxxx>>
>>>>>     *Sent:* Wednesday, May 3, 2017 12:09:58 PM
>>>>>     *To:* Soumya Koduri
>>>>>     *Cc:* gluster-users@xxxxxxxxxxx <mailto:gluster-users@xxxxxxxxxxx>
>>>>>     *Subject:* Re:  Gluster and NFS-Ganesha - cluster is
>>>>>
>>>>>     down after reboot
>>>>>
>>>>>     Hi Soumya,
>>>>>
>>>>>     thank you very much for your reply.
>>>>>
>>>>>     I enabled pcsd during setup and after reboot during troubleshooting
>>>>>     I manually started it and checked resources (pcs status). They were
>>>>>     not running. I didn’t find what was wrong but I’m going to try it
>>>>> again.
>>>>>
>>>>>     I’ve thoroughly checked
>>>>>
>>>>>
>>>>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>>>
>>>>>
>>>>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>>>     and I can confirm that I followed all steps with one exception. I
>>>>>     installed following RPMs:
>>>>>     glusterfs-server
>>>>>     glusterfs-fuse
>>>>>     glusterfs-cli
>>>>>     glusterfs-ganesha
>>>>>     nfs-ganesha-xfs
>>>>>
>>>>>     and the guide referenced above specifies:
>>>>>     glusterfs-server
>>>>>     glusterfs-api
>>>>>     glusterfs-ganesha
>>>>>
>>>>>     glusterfs-api is a dependency of one of RPMs that I installed so
>>>>>     this is not a problem. But I cannot find any mention to install
>>>>>     nfs-ganesha-xfs.
>>>>>
>>>>>     I’ll try to setup the whole environment again without installing
>>>>>     nfs-ganesha-xfs (I assume glusterfs-ganesha has all required
>>>>> binaries).
>>>>>
>>>>>     Again, thank you for you time to answer my previous message.
>>>>>
>>>>>     Kind regards,
>>>>>     Adam
>>>>>
>>>>>     On Tue, May 2, 2017 at 8:49 AM, Soumya Koduri <skoduri@xxxxxxxxxx
>>>>>     <mailto:skoduri@xxxxxxxxxx>> wrote:
>>>>>
>>>>>         Hi,
>>>>>
>>>>>         On 05/02/2017 01:34 AM, Rudolf wrote:
>>>>>
>>>>>             Hi Gluster users,
>>>>>
>>>>>             First, I'd like to thank you all for this amazing
>>>>>             open-source! Thank you!
>>>>>
>>>>>             I'm working on home project – three servers with Gluster and
>>>>>             NFS-Ganesha. My goal is to create HA NFS share with three
>>>>>             copies of each
>>>>>             file on each server.
>>>>>
>>>>>             My systems are CentOS 7.3 Minimal install with the latest
>>>>>             updates and
>>>>>             the most current RPMs from "centos-gluster310" repository.
>>>>>
>>>>>             I followed this tutorial:
>>>>>
>>>>>
>>>>> http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
>>>>>
>>>>>
>>>>> <http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/>
>>>>>             (second half that describes multi-node HA setup)
>>>>>
>>>>>             with a few exceptions:
>>>>>
>>>>>             1. All RPMs are from "centos-gluster310" repo that is
>>>>>             installed by "yum
>>>>>             -y install centos-release-gluster"
>>>>>             2. I have three nodes (not four) with "replica 3" volume.
>>>>>             3. I created empty ganesha.conf and not empty
>>>>> ganesha-ha.conf
>>>>> in
>>>>>             "/var/run/gluster/shared_storage/nfs-ganesha/" (referenced
>>>>>             blog post is
>>>>>             outdated, this is now requirement)
>>>>>             4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this
>>>>>             isn't needed
>>>>>             anymore.
>>>>>
>>>>>
>>>>>         Please refer to
>>>>>
>>>>>
>>>>> http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/
>>>>>
>>>>>
>>>>> <http://gluster.readthedocs.io/en/latest/Administrator%20Guide/NFS-Ganesha%20GlusterFS%20Integration/>
>>>>>
>>>>>         It is being updated with latest changes happened wrt setup.
>>>>>
>>>>>             When I finish configuration, all is good.
>>>>>             nfs-ganesha.service is active
>>>>>             and running and from client I can ping all three VIPs and I
>>>>>             can mount
>>>>>             NFS. Copied files are replicated to all nodes.
>>>>>
>>>>>             But when I restart nodes (one by one, with 5 min. delay
>>>>>             between) then I
>>>>>             cannot ping or mount (I assume that all VIPs are down). So
>>>>>             my setup
>>>>>             definitely isn't HA.
>>>>>
>>>>>             I found that:
>>>>>             # pcs status
>>>>>             Error: cluster is not currently running on this node
>>>>>
>>>>>
>>>>>         This means pcsd service is not up. Did you enable (systemctl
>>>>>         enable pcsd) pcsd service so that is comes up post reboot
>>>>>         automatically. If not please start it manually.
>>>>>
>>>>>
>>>>>             and nfs-ganesha.service is in inactive state. Btw. I didn't
>>>>>             enable
>>>>>             "systemctl enable nfs-ganesha" since I assume that this is
>>>>>             something
>>>>>             that Gluster does.
>>>>>
>>>>>
>>>>>         Please check /var/log/ganesha.log for any errors/warnings.
>>>>>
>>>>>         We recommend not to enable nfs-ganesha.service (by default), as
>>>>>         the shared storage (where the ganesha.conf file resides now)
>>>>>         should be up and running before nfs-ganesha gets started.
>>>>>         So if enabled by default it could happen that shared_storage
>>>>>         mount point is not yet up and it resulted in nfs-ganesha service
>>>>>         failure. If you would like to address this, you could have a
>>>>>         cron job which keeps checking the mount point health and then
>>>>>         start nfs-ganesha service.
>>>>>
>>>>>         Thanks,
>>>>>         Soumya
>>>>>
>>>>>
>>>>>             I assume that my issue is that I followed instructions in
>>>>>             blog post from
>>>>>             2015/10 that are outdated. Unfortunately I cannot find
>>>>>             anything better –
>>>>>             I spent whole day by googling.
>>>>>
>>>>>             Would you be so kind and check the instructions in blog post
>>>>>             and let me
>>>>>             know what steps are wrong / outdated? Or please do you have
>>>>>             more current
>>>>>             instructions for Gluster+Ganesha setup?
>>>>>
>>>>>             Thank you.
>>>>>
>>>>>             Kind regards,
>>>>>             Adam
>>>>>
>>>>>
>>>>>
>>>>>             _______________________________________________
>>>>>             Gluster-users mailing list
>>>>>             Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
>>>>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     --
>>>>>     Adam
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Adam
>>>
>>>
>>>
>>>
>>
>
>
>
> --
> Adam

-- 
Adam
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users