Hi Sachin, Thanks for the report. From: Sachin Sant <sachinp@xxxxxxxxxxxxx> Date: Mon, 27 Jun 2022 10:28:27 +0530 > With the latest -next I have observed a peculiar issue on IBM Power > server running -next(5.19.0-rc3-next-20220624) . > > Fingerprint authentication systemd service (fprintd) fails to start while > attempting OS login after kernel boot. There is a visible delay of 18-20 > seconds before being prompted for OS login password. > > Kernel 5.19.0-rc3-next-20220624 on an ppc64le > > ltcden8-lp6 login: root > <<=======. delay of 18-20 seconds > Password: > > Following messages(fprintd service) are seen in /var/log/messages: > > systemd[1]: Startup finished in 1.842s (kernel) + 1.466s (initrd) + 29.230s (userspace) = 32.540s. It seems the kernel finishes its job immediately but userspace takes more time on retrying or something. The service_start_timeout seems to be the timeout period. > NetworkManager[1100]: <info> [1656304146.6686] manager: startup complete > dbus-daemon[1027]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.21' (uid=0 pid=1502 comm="/bin/login -p -- ") > systemd[1]: Starting Fingerprint Authentication Daemon... > fprintd[2521]: (fprintd:2521): fprintd-WARNING **: 00:29:08.568: Failed to open connection to bus: Could not connect: Connection refused I think this message comes from here. https://github.com/freedesktop/libfprint-fprintd/blob/master/src/main.c#L183-L189 I'm not sure what the program does though, I guess it failed to find a peer socket in the hash table while calling connect()/sendmsg() syscalls and got -ECONNREFUSED in unix_find_bsd() or unix_find_abstract(). > systemd[1]: fprintd.service: Main process exited, code=exited, status=1/FAILURE > systemd[1]: fprintd.service: Failed with result 'exit-code'. > systemd[1]: Failed to start Fingerprint Authentication Daemon. > dbus-daemon[1027]: [system] Failed to activate service 'net.reactivated.Fprint': timed out (service_start_timeout=25000ms) > > Mainline (5.19.0-rc3) or older -next does not have this problem. > > Git bisect between mainline & -next points to the following patch: > > # git bisect bad > cf2f225e2653734e66e91c09e1cbe004bfd3d4a7 is the first bad commit > commit cf2f225e2653734e66e91c09e1cbe004bfd3d4a7 > > Date: Tue Jun 21 10:19:12 2022 -0700 > > af_unix: Put a socket into a per-netns hash table. > > I don’t know how the above identified patch is related to the failure, > but given that I can consistently recreate the issue assume the bisect > result can be trusted. Before the commit, all of sockets on the host are linked in a global hash table, and after the commit, they are linked in their network namespace's hash table. So, I believe there is no change visible to userspace. > I have attached dmesg log for reference. Let me know if any additional > Information is required. * Could you provide * dmesg and /var/log/messages on a successful case? (without the commit) * Unit file * repro steps * Is it reproducible after login? (e.g. systemctl restart) * If so, please provide * the result of strace -t -ff * Does it happen on only powerpc? How about x86 or arm64? * What does the service does? * connect() or sendmsg() * protocol family * abstract or BSD socket Best regards, Kuniyuki