Re: hook script question related to ctdb, shared storage, and bind mounts

Strahil <hunter86_bg@xxxxxxxxx> · Tue, 05 Nov 2019 05:05:08 +0200

Sure,

Here is what was the setup :

[root@ovirt1 ~]# systemctl cat var-run-gluster-shared_storage.mount --no-pager
# /run/systemd/generator/var-run-gluster-shared_storage.mount
# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)

[Mount]
What=gluster1:/gluster_shared_storage
Where=/var/run/gluster/shared_storage
Type=glusterfs
Options=defaults,x-systemd.requires=glusterd.service,x-systemd.automount

[root@ovirt1 ~]# systemctl cat var-run-gluster-shared_storage.automount --no-pager
# /run/systemd/generator/var-run-gluster-shared_storage.automount
# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
Before=remote-fs.target
After=glusterd.service
Requires=glusterd.service
[Automount]
Where=/var/run/gluster/shared_storage

[root@ovirt1 ~]# systemctl cat glusterd --no-pager
# /etc/systemd/system/glusterd.service
[Unit]
Description=GlusterFS, a clustered file-system server
Requires=rpcbind.service gluster_bricks-engine.mount gluster_bricks-data.mount gluster_bricks-isos.mount
After=network.target rpcbind.service gluster_bricks-engine.mount gluster_bricks-data.mount gluster_bricks-isos.mount
Before=network-online.target

[Service]
Type=forking
PIDFile=/var/run/glusterd.pid
LimitNOFILE=65536
Environment="LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/glusterd
ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid  --log-level $LOG_LEVEL $GLUSTERD_OPTIONS
KillMode=process
SuccessExitStatus=15

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/glusterd.service.d/99-cpu.conf
[Service]
CPUAccounting=yes
Slice=glusterfs.slice

[root@ovirt1 ~]# systemctl cat ctdb  --no-pager
# /etc/systemd/system/ctdb.service
[Unit]
Description=CTDB
Documentation=man:ctdbd(1) man:ctdb(7)
After=network-online.target time-sync.target glusterd.service var-run-gluster-shared_storage.automount
Conflicts=var-lib-nfs-rpc_pipefs.mount

[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
Type=forking
LimitCORE=infinity
PIDFile=/run/ctdb/ctdbd.pid
ExecStartPre=/bin/bash -c "sleep 2; if [ -f /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us ]; then echo 10000 > /sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us; fi"
ExecStartPre=/bin/bash -c 'if [[ $(find /var/log/log.ctdb -type f -size +20971520c 2>/dev/null) ]]; then truncate -s 0 /var/log/log.ctdb;fi'
ExecStartPre=/bin/bash -c 'if [ -d "/var/run/gluster/shared_storage/lock" ] ;then exit 4; fi'
ExecStart=/usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
ExecStop=/usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid stop
KillMode=control-group
Restart=no

[Install]
WantedBy=multi-user.target

[root@ovirt1 ~]# systemctl cat nfs-ganesha --no-pager
# /usr/lib/systemd/system/nfs-ganesha.service
# This file is part of nfs-ganesha.
#
# There can only be one NFS-server active on a system. When NFS-Ganesha is
# started, the kernel NFS-server should have been stopped. This is achieved by
# the 'Conflicts' directive in this unit.
#
# The Network Locking Manager (rpc.statd) is provided by the nfs-utils package.
# NFS-Ganesha comes with its own nfs-ganesha-lock.service to resolve potential
# conflicts in starting multiple rpc.statd processes. See the comments in the
# nfs-ganesha-lock.service for more details.
#

[Unit]
Description=NFS-Ganesha file server
Documentation=http://github.com/nfs-ganesha/nfs-ganesha/wiki
After=rpcbind.service nfs-ganesha-lock.service
Wants=rpcbind.service nfs-ganesha-lock.service
Conflicts=nfs.target

After=nfs-ganesha-config.service
Wants=nfs-ganesha-config.service

[Service]
Type=forking
Environment="NOFILE=1048576"
EnvironmentFile=-/run/sysconfig/ganesha
ExecStart=/bin/bash -c "${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH}"
ExecStartPost=-/bin/bash -c "prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE"
ExecStartPost=-/bin/bash -c "/usr/bin/sleep 2 && /bin/dbus-send --system   --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin  org.ganesha.nfsd.admin.init_fds_limit"
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/dbus-send --system   --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown

[Install]
WantedBy=multi-user.target
Also=nfs-ganesha-lock.service

I can't guarantee that it will work 100% in your setup, but I remmember I had only few hicups after all node powerdown+powerup.

P.S.: I still prefer corosync/pacemaker but in my setup I cannot have fencing and in hyperconverged setup it gets even more complex. If your cluster is gluster only - consider pacemaker for that task.

Best Regards,
Strahil NikolovOn Nov 4, 2019 15:57, Erik Jacobson <erik.jacobson@xxxxxxx> wrote:
>
> Thank you! I am very interested. I hadn't considered the automounter 
> idea. 
>
> Also, your fstab has a different dependency approach than mine otherwise 
> as well. 
>
> If you happen to have the examples handy, I'll give them a shot here. 
>
> I'm looking forward to emerging from this dark place of dependencies not 
> working!! 
>
> Thank you so much for writing back, 
>
> Erik 
>
> On Mon, Nov 04, 2019 at 06:59:10AM +0200, Strahil wrote: 
> > Hi Erik, 
> > 
> > I took another approach. 
> > 
> > 1.  I got a systemd mount unit for my ctdb lock volume's brick: 
> > [root@ovirt1 system]# grep var /etc/fstab 
> > gluster1:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults,x-systemd.requires=glusterd.service,x-systemd.automount        0 0 
> > 
> > As you can see - it is an automounter, because sometimes it fails to mount on time 
> > 
> > 2.  I got custom systemd services for glusterd,ctdb and vdo -  as I need to 'put' dependencies for each of those. 
> > 
> > Now, I'm no longer using ctdb & NFS Ganesha (as my version of ctdb cannot use hpstnames and my environment is a little bit crazy), but I can still provide hints how I did it. 
> > 
> > Best Regards, 
> > Strahil NikolovOn Nov 3, 2019 22:46, Erik Jacobson <erik.jacobson@xxxxxxx> wrote: 
> > > 
> > > So, I have a solution I have written about in the based that is based on 
> > > gluster with CTDB for IP and a level of redundancy. 
> > > 
> > > It's been working fine except for a few quirks I need to work out on 
> > > giant clusters when I get access. 
> > > 
> > > I have 3x9 gluster volume, each are also NFS servers, using gluster 
> > > NFS (ganesha isn't reliable for my workload yet). There are 9 IP 
> > > aliases spread across 9 servers. 
> > > 
> > > I also have many bind mounts that point to the shared storage as a 
> > > source, and the /gluster/lock volume ("ctdb") of course. 
> > > 
> > > glusterfs 4.1.6 (rhel8 today, but I use rhel7, rhel8, sles12, and 
> > > sles15) 
> > > 
> > > Things work well when everything is up and running. IP failover works 
> > > well when one of the servers goes down. My issue is when that server 
> > > comes back up. Despite my best efforts with systemd fstab dependencies, 
> > > the shared storage areas including the gluster lock for CTDB do not 
> > > always get mounted before CTDB starts. This causes trouble for CTDB 
> > > correctly joining the collective. I also have problems where my 
> > > bind mounts can happen before the shared storage is mounted, despite my 
> > > attempts at preventing this with dependencies in fstab. 
> > > 
> > > I decided a better approach would be to use a gluster hook and just 
> > > mount everything I need as I need it, and start up ctdb when I know and 
> > > verify that /gluster/lock is really gluster and not a local disk. 
> > > 
> > > I started down a road of doing this with a start host hook and after 
> > > spending a while at it, I realized my logic error. This will only fire 
> > > when the volume is *started*, not when a server that was down re-joins. 
> > > 
> > > I took a look at the code, glusterd-hooks.c, and found that support 
> > > for "brick start" is not in place for a hook script but it's nearly 
> > > there: 
> > > 
> > >         [GD_OP_START_BRICK]             = EMPTY, 
> > > ... 
> > > 
> > > and no entry in glusterd_hooks_add_op_args() yet. 
> > > 
> > > 
> > > Before I make a patch for my own use, I wanted to do a sanity check and 
> > > find out if others have solved this better than the road I'm heading 
> > > down. 
> > > 
> > > What I was thinking of doing is enabling a brick start hook, and 
> > > do my processing for volumes being mounted from there. However, I 
> > > suppose brick start is a bad choice for the case of simply stopping and 
> > > starting the volume, because my processing would try to complete before 
> > > the gluster volume was fully started. It would probably work for a brick 
> > > "coming back and joining" but not "stop volume/start volume". 
> > > 
> > > Any suggestions? 
> > > 
> > > My end goal is: 
> > > - mount shared storage every boot 
> > > - only attempt to mount when gluster is available (_netdev doesn't seem 
> > >    to be enough) 
> > > - never start ctdb unless /gluster/lock is a shared storage and not a 
> > >    directory. 
> > > - only do my bind mounts from shared storage in to the rest of the 
> > >    layout when we are sure the shared storage is mounted (don't 
> > >    bind-mount using an empty directory as a source by accident!) 
> > > 
> > > Thanks so much for reading my question, 
> > > 
> > > Erik 
> > > ________ 
> > > 
> > > Community Meeting Calendar: 
> > > 
> > > APAC Schedule - 
> > > Every 2nd and 4th Tuesday at 11:30 AM IST 
> > > Bridge: https://bluejeans.com/118564314 ; 
> > > 
> > > NA/EMEA Schedule - 
> > > Every 1st and 3rd Tuesday at 01:00 PM EDT 
> > > Bridge: https://bluejeans.com/118564314 ; 
> > > 
> > > Gluster-users mailing list 
> > > Gluster-users@xxxxxxxxxxx 
> > > https://lists.gluster.org/mailman/listinfo/gluster-users ; 
>
>
> Erik Jacobson 
> Software Engineer 
>
> erik.jacobson@xxxxxxx 
> +1 612 851 0550 Office 
>
> Eagan, MN 
> hpe.com 
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users