On Tue, 2019-12-10 at 12:41 +0800, Ian Kent wrote: > On Thu, 2019-12-05 at 04:26 -0500, Doug Nazar wrote: > > On autofs 5.1.6, after an unsuccessful mount attempt (stopped > > server) > > using a program map for /net, it'll never recover once the server > > is > > started again. > > > > Here's the initial debug log for the failure: > > > > handle_packet: type = 3 > > handle_packet_missing_indirect: token 6631, name wraith, request > > pid > > 32245 > > attempting to mount entry /net/wraith > > lookup_mount: lookup(program): looking up wraith > > lookup_mount: lookup(program): wraith -> > > -fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 / wraith:/ > > parse_mount: parse(sun): expanded entry: > > -fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 / wraith:/ > > parse_mount: parse(sun): gathered options: > > fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 > > parse_mount: parse(sun): dequote("/") -> / > > parse_mapent: parse(sun): gathered options: > > fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 > > parse_mapent: parse(sun): dequote("wraith:/") -> wraith:/ > > update_offset_entry: parse(sun): updated multi-mount offset / -> > > -fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 wraith:/ > > parse_mapent: parse(sun): gathered options: > > fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 > > parse_mapent: parse(sun): dequote("wraith:/") -> wraith:/ > > sun_mount: parse(sun): mounting root /net/wraith/, mountpoint > > wraith, > > what wraith:/, fstype nfs, options hard,intr,nodev,nosuid,sec=krb5 > > mount(nfs): root=/net/wraith/ name=wraith what=wraith:/, > > fstype=nfs, > > options=hard,intr,nodev,nosuid,sec=krb5 > > mount(nfs): nfs options="hard,intr,nodev,nosuid,sec=krb5", > > nobind=0, > > nosymlink=0, ro=0 > > get_nfs_info: called with host wraith(192.168.21.90) proto 6 > > version > > 0x20 > > get_nfs_info: called with host wraith(192.168.21.90) proto 17 > > version > > 0x20 > > get_nfs_info: called with host wraith(fde2:2b6c:2d24:21::5a) proto > > 6 > > version 0x20 > > get_nfs_info: called with host wraith(fde2:2b6c:2d24:21::5a) proto > > 17 > > version 0x20 > > mount(nfs): no hosts available > > dev_ioctl_send_fail: token = 6631 > > failed to mount /net/wraith > > > > After a few minutes another attempt after I've re-started the > > server > > on > > target: > > > > handle_packet: type = 3 > > handle_packet_missing_indirect: token 6635, name wraith, request > > pid > > 32309 > > attempting to mount entry /net/wraith > > lookup_mount: lookup(program): wraith -> > > -fstype=nfs,hard,intr,nodev,nosuid,sec=krb5 / wraith:/ > > lookup(program): unexpected lookup for active multi-mount key > > wraith, > > returning fail > > dev_ioctl_send_fail: token = 6635 > > failed to mount /net/wraith > > > > I'm currently running this patch but don't have much confidence in > > it. > > I'm unsure of the lifetime rules for me->multi, maybe it should > > have > > been cleared after failure mounting? > > I've returned to look at this a few times now but don't have an > proper answer for you just yet, thought I'd let you know I am > thinking about it. > > > diff --git a/modules/lookup_program.c b/modules/lookup_program.c > > index fcb1af7..b6f854b 100644 > > --- a/modules/lookup_program.c > > +++ b/modules/lookup_program.c > > @@ -646,7 +646,7 @@ int lookup_mount(struct autofs_point *ap, const > > char > > *name, int name_len, void * > > name_len, ent, ctxt->parse- > > > context); > > goto out_free; > > } else { > > - if (me->multi) { > > + if (me->multi && me->multi != me) { > > cache_unlock(mc); > > warn(ap->logopt, MODPREFIX > > "unexpected lookup for > > active > > multi-mount" > > Yes, the problem occurs because it's a top level singleton multi- > mount > otherwise you wouldn't get a lookup taking this code path. I also need to work out why you don't get caught by the negative map entry check that's meant to prevent multiple retries for a failing map entry for a configured time. > > And even the entry delete below it should be ok because it will > just lookup (aka. run the program map again to get the map entry) > and then update the multi-mount during the entry parse. > > So while the change above isn't strictly the way this should be > handled it probably should be ok. > > I haven't worked out how to handle it immediately after the fail > just yet but the change above probably should be kept as part of > that as well, not sure yet. > > Ian