David:
"Yes, all the timeouts are derived from the
io_timeout and are dictated by
the recovery requirements and the algorithm the host_id
leases are based
on: "Light-Weight Leases for Storage-Centric Coordination"
by Gregory
Chockler and Dahlia Malkhi.
Here are the actual equations copied from
sanlock_internal.h.
"delta" refers to host_id leases that take a long time to
acquire at startup
"free" corresponds to starting up after a clean shutdown
"held" corresponds to starting up after an unclean
shutdown
You should find that with 30 sec io timeout these come out
to 1 min / 4 min
which you see when starting after a clean / unclean
shutdown."
Since I configured an io_timeout of 30s in sanlock
(SANLOCKOPTS="-R 1 -o 30"),
the delays at sanlock startup is defined by
the delta_acquire_held_min variable,
which is calculated as:
int max = host_dead_seconds;
if (delta_large_delay >
max)
max =
delta_large_delay;
int delta_acquire_held_min =
max;
So max is host_dead_seconds, which is calculated
as:
int host_dead_seconds =
id_renewal_fail_seconds + WATCHDOG_FIRE_TIMEOUT;
And id_renewal_fail_seconds is:
int id_renewal_fail_seconds = 8 *
io_timeout_seconds;
WATCHDOG_FIRE_TIMEOUT = 60
So that makes 8 * 30 + 60, or a total of 300s
before the lock can be acquired.
And that's exactly the time that is shown in
libvirtd.log before the lockspace is registered.
When a proper reboot is done, or when
sanlock/libvirtd is just restarted, delta_acquire_free_min
defines the delay:
int delta_short_delay = 2 *
io_timeout_seconds;
int delta_acquire_free_min = delta_short_delay;
Which is confirmed by the 60s delay in the
libvirtd.log file:
10:33:54.097: 7983: debug : virLockManagerSanlockInit:267
: version=1000000 configFile=/etc/libvirt/qemu-sanlock.conf
flags=0
10:34:55.111: 7983: debug :
virLockManagerSanlockSetupLockspace:247 : Lockspace
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__ has been
registered
So the whole delay is caused by the io_timeout which is set
to 30s because the lockspace is on GFS2,
and the GFS2 volume can be locked for some time while a
node gets fenced, and the journal is applied.
Depending on clean/unclean restarts, the delay may differ.
So delta_acquire_held_min is based on host_dead_seconds, because sanlock wants
to be sure that the
lock won't be reacquired before the host
is actually dead.
David told me that sanlock is supposed to
use a block device.
I wonder if this makes sense when used on
top of GFS2, which already has its own locking and safety
mechanism... It looks to me like it just
adds the delay without any reason in this case?
Perhaps delta_acquire_held_min
should be the same as delta_acquire_free_min
because we don't need
this safety delay on top of gfs2.
For NFS, it may be a whole other story...
Best regards,
Frido