Re: gluster NFS hang observed mounting or umounting at scale

Erik Jacobson <erik.jacobson@xxxxxxx> · Sun, 26 Jan 2020 12:04:00 -0600

> One last reply to myself.

One of the test cases my test scripts triggered turned out to actually
be due to my NFS RW mount options.

OLD RW NFS mount options:
"rw,noatime,nocto,actimeo=3600,lookupcache=all,nolock,tcp,vers=3"

NEW options that work better
rw,noatime,nolock,tcp,vers=3"

I had copied the RO NFS options we use which try to be aggressive about
caching. The RO root image doesn't change much and we want it as fast
as possible. The options are not appropriate for RW areas that change.
(Even though it's a single image file we care about).

So now my test scripts run clean but since what we see on larger systems
is right after reboot, the caching shouldn't matter. In the real problem
case, the RW stuff is done once after reboot.

FWIW I attached my current test scripts, my last batch had some errors.

The search continues for the actual problem, which I'm struggling to
reproduce @ 366 NFs clients.

I believe yesterday, when I posted about actual HANGS, that is the real
problem we're tracking. I hit that once in my test scripts - only once.
My script was otherwise hitting a "file doesn't really exist even though
cached" issue and it was tricking my scripts.

In any case, I'm changing the RW NFS options we use regardless.

Erik
Attachment:
nfs-issues.tar.xz

Description: application/xz
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users