Le vendredi 05 avril 2019 à 16:55 +0530, Nithya Balachandran a écrit : > On Fri, 5 Apr 2019 at 12:16, Michael Scherer <mscherer@xxxxxxxxxx> > wrote: > > > Le jeudi 04 avril 2019 à 18:24 +0200, Michael Scherer a écrit : > > > Le jeudi 04 avril 2019 à 19:10 +0300, Yaniv Kaul a écrit : > > > > I'm not convinced this is solved. Just had what I believe is a > > > > similar > > > > failure: > > > > > > > > *00:12:02.532* A dependency job for rpc-statd.service failed. > > > > See > > > > 'journalctl -xe' for details.*00:12:02.532* mount.nfs: > > > > rpc.statd is > > > > not running but is required for remote locking.*00:12:02.532* > > > > mount.nfs: Either use '-o nolock' to keep locks local, or start > > > > statd.*00:12:02.532* mount.nfs: an incorrect mount option was > > > > specified > > > > > > > > (of course, it can always be my patch!) > > > > > > > > https://build.gluster.org/job/centos7-regression/5384/console > > > > > > same issue, different builder (206). I will check them all, as > > > the > > > issue is more widespread than I expected (or it did popup since > > > last > > > time I checked). > > > > Deepshika did notice that the issue came back on one server > > (builder202) after a reboot, so the rpcbind issue is not related to > > the > > network initscript one, so the RCA continue. > > > > We are looking for another workaround involving fiddling with the > > socket (until we find why it do use ipv6 at boot, but not after, > > when > > ipv6 is disabled). > > > > Could this be relevant? > https://access.redhat.com/solutions/2798411 Good catch. So, we already do that, Nigel took care of that (after 2 days of research). But I didn't knew the exact symptoms, and decided to double check just in case. And... there is no sysctl.conf in the initrd. Running dracut -v -f do not change anything. Running "dracut -v -f -H" take care of that (and this fix the problem), but: - our ansible script already run that - -H is hostonly, which is already the default on EL7 according to the doc. However, if dracut-config-generic is installed, it doesn't build a hostonly initrd, and so do not include the sysctl.conf file (who break rpcbnd, who break the test suite). And for some reason, it is installed the image in ec2 (likely default), but not by default on the builders. So what happen is that after a kernel upgrade, dracut rebuild a generic initrd instead of a hostonly one, who break things. And kernel was likely upgraded recently (and upgrade happen nightly (for some value of "night"), so we didn't see that earlier, nor with a fresh system. So now, we have several solution: - be explicit on using hostonly in dracut, so this doesn't happen again (or not for this reason) - disable ipv6 in rpcbind in a cleaner way (to be tested) - get the test suite work with ip v6 In the long term, I also want to monitor the processes, but for that, I need a VPN between the nagios server and ec2, and that project got blocked by several issues (like EC2 not support ecdsa keys, and we use that for ansible, so we have to come back to RSA for full automated deployment, and openvon requires to use certificates, so I need a newer python openssl for doing what I want, and RHEL 7 is too old, etc, etc). As the weekend approach for me, I just rebuilt the initrd for the time being. I guess forcing hostonly is the safest fix for now, but this will be for monday. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-devel