On Thu, Oct 11, 2018 at 05:37:42PM +0200, Christian Ehrhardt wrote: > On Thu, Oct 11, 2018 at 4:04 PM Jason Gunthorpe <[1]jgg@xxxxxxxx> > wrote: > > On Tue, Oct 09, 2018 at 04:43:06PM +0200, Christian Ehrhardt wrote: > > The default modules config that is processed is > kernel-boot/modules/rdma.conf > > which does not contain ib_umad (infiniband.conf and opa.conf > would). But > > no matter what the default configs are - they could be modified by > an admin, > > due to that today there are cases the service would start ibacm > with the module > > not loaded. > > > > That will trigger the service to immediately fail with: > > ibacm[1796]: ibwarn: [1796] umad_init: can't read ABI version from > > /sys/class/infiniband_mad/abi_version (No such file or > directory): is > > ib_umad module loaded? > > systemd[1]: ibacm.service: Main process exited, code=exited, > status=255/n/a > Why is ibacm.service even starting? > > Hi and thanks for your Feedback Jason. > This question was just right to lead me to the problem. > It is still handled in Debian packaging with old dh_installinit scripts > for ibacm which will try to start the service no matter what. > Obviously that will fail and needs to be fixed in the packaging, I'll > do some experiments and come back with a fix for that. Ah, yes, that seems like a problem for sure > if ib_umad is not loaded or not working > > +ConditionPathExists=/sys/class/infiniband_mad/abi_version > This is not a good solution, it will not support later hotplug of > devices that need ibacm. > > I actually think this would still be the right thing to do (in addition > to fix the Deb packaging discussed above). > I did not at the condition to the socket, but the service. > If the service is started either manually or by the socket and this > path does not exist, then it will fail fatally. > It seems safer to check than to fail to me. What happens to the message on the socket at this point? Does the requestor hang? Seems sketchy, would be better if ACM started and then NAK'd the message. > For Hotplug, at the time something uses the socket late in the > lifecycle it would try to start the service. > At that time it would evaluate if that is reasonable (path exists) and > do so or not. Does systemd try to start after every socket message or does it latch into some kind of failure? Would be good to confirm this before relying on it.. And a comment.. Jason