Stephan, patch "Move qb_loop creation after daemonization" should fix start in daemon mode. Other questions are really no longer corosync related, so it is probably good idea to try LibQB and/or Pacemaker mailing list. Actually, QB_IPC_SHM is NOT supported by LibQB on NetBSD (only socket is). It's also good idea to use QB_IPC_NATIVE (CC'ing Andrew). But actually, situation is little harder, because QB_IPC_SOCKET deadlock (often and very reproducible, not only on NetBSD, but also on Linux). I've created ticket in github. Regards, Honza Stephan napsal(a): > lrmd fails here: > > mainloop_add_ipc_server(CRM_SYSTEM_LRMD, QB_IPC_SHM, &lrmd_ipc_callbacks); > > > Calling the following function from /lib/common/mainloop.c > -------8<-------- > qb_ipcs_service_t *mainloop_add_ipc_server( > const char *name, enum qb_ipc_type type, struct > qb_ipcs_service_handlers *callbacks) > { > int rc = 0; > qb_ipcs_service_t* server = NULL; > > if(gio_map == NULL) { > gio_map = qb_array_create_2(64, sizeof(struct gio_to_qb_poll), 1); > } > > server = qb_ipcs_create(name, 0, pick_ipc_type(type), callbacks); > qb_ipcs_poll_handlers_set(server, &gio_poll_funcs); > > rc = qb_ipcs_run(server); > if (rc < 0) { > crm_err("Could not start %s IPC server: %s (%d)", name, > strerror(rc), rc); > return NULL; > } > > return server; > } > > -------------------------- > > I think a shared memory region should be created using libqb. Is this > known to work on BSD systems? > > > 2012/12/11 Stephan <stephanwib@xxxxxxxxxxxxxx>: >> Yes, kqueues are not inherited. I recompiled and installed pacemaker >> 1.1 for corosync 2.x. It doesn´t yet work (I just started pacemakerd.. >> I hope this is okay) ... it seems that lrmd is facing the first issue: >> >> lrmd[15312]: error: mainloop_add_ipc_server: Could not start lrmd >> IPC server: Unknown error: 4294967210 (-86) >> >> >> All messages: >> >> -----8<---------- >> Dec 11 14:22:31 ctx4980gate2 pacemakerd[13003]: info: >> crm_update_callsites: Enabling callsites based on priority=6, >> files=(null), functions=(null), formats=(null), tags=(null) >> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 20. >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> crm_add_logfile: Additional logging available in >> /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: main: >> Starting Pacemaker 1.1.8 (Build: 1f8858c): ncurses libqb-logging >> libqb-ipc lha-fencing corosync-native >> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 18. >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> update_node_processes: 0x7f7ff7b09150 Node 3232235777 now known as >> ctx4980gate2, was: >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: notice: crm_add_logfile: >> Additional logging available in /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: info: >> crm_update_callsites: Enabling callsites based on priority=6, >> files=(null), functions=(null), formats=(null), tags=(null) >> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]: notice: >> crm_add_logfile: Additional logging available in >> /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]: info: >> crm_update_callsites: Enabling callsites based on priority=6, >> files=(null), functions=(null), formats=(null), tags=(null) >> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]: notice: >> crm_cluster_connect: Connecting to cluster infrastructure: corosync >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: notice: main: Using new >> config location: /var/lib/pacemaker/cib >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: warning: retrieveCib: >> Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: warning: readCibXmlFile: >> Primary configuration corrupt or unusable, trying backup... >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: warning: readCibXmlFile: >> Continuing with an empty configuration. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: notice: crm_add_logfile: >> Additional logging available in /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: info: >> crm_update_callsites: Enabling callsites based on priority=6, >> files=(null), functions=(null), formats=(null), tags=(null) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: notice: crm_add_logfile: >> Additional logging available in /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 pengine[17349]: notice: >> crm_add_logfile: Additional logging available in >> /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: notice: >> crm_cluster_connect: Connecting to cluster infrastructure: corosync >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 pengine[17349]: error: >> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 pengine[17349]: error: main: Couldn't >> start IPC server >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: main: Failed to >> allocate lrmd server. shutting down >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >> pcmk_child_exit: Child process lrmd exited (pid=15312, rc=255) >> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: >> qb_ipcs_us_publish: Could not bind AF_UNIX (/var/run/attrd): Address >> already in use (48) >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> pcmk_child_exit: Respawning failed child process: lrmd >> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: >> mainloop_add_ipc_server: Could not start attrd IPC server: Unknown >> error: 4294967248 (-48) >> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: main: Could not >> start IPC server >> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: main: Aborting startup >> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 26. >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >> pcmk_child_exit: Child process pengine exited (pid=17349, rc=1) >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> pcmk_child_exit: Respawning failed child process: pengine >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >> pcmk_child_exit: Child process attrd exited (pid=9542, rc=100) >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: warning: >> pcmk_child_exit: Pacemaker child process attrd no longer wishes to be >> respawned. Shutting ourselves down. >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> pcmk_shutdown_worker: Shuting down Pacemaker >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >> Stopping crmd: Sent -15 to process 13681 >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> pcmk_child_exit: Child process crmd terminated with signal 15 >> (pid=13681, core=0) >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >> Stopping pengine: Sent -15 to process 10446 >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: notice: >> crm_cluster_connect: Connecting to cluster infrastructure: corosync >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13) >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: >> mainloop_add_ipc_server: Could not start cib_ro IPC server: Unknown >> error: 4294967283 (-13) >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: qb_ipcs_us_publish: >> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13) >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: >> mainloop_add_ipc_server: Could not start cib_rw IPC server: Unknown >> error: 4294967283 (-13) >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: >> mainloop_add_ipc_server: Could not start cib_shm IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: cib_init: Couldnt >> start all IPC channels, exiting. >> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 26. >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >> pcmk_child_exit: Child process cib exited (pid=13836, rc=255) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: notice: crm_add_logfile: >> Additional logging available in /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: info: >> crm_update_callsites: Enabling callsites based on priority=6, >> files=(null), functions=(null), formats=(null), tags=(null) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 pengine[10446]: notice: >> crm_add_logfile: Additional logging available in >> /var/log/cluster/corosync.log >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 pengine[10446]: error: >> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 pengine[10446]: error: main: Couldn't >> start IPC server >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >> error: 4294967210 (-86) >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >> pcmk_child_exit: Child process pengine exited (pid=10446, rc=1) >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >> New IPC server could not be created because another lrmd process >> exists, sending shutdown command to old lrmd process. >> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: main: Failed to >> allocate lrmd server. shutting down >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >> Stopping lrmd: Sent -15 to process 22677 >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >> pcmk_child_exit: Child process lrmd terminated with signal 15 >> (pid=22677, core=0) >> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >> Stopping stonith-ng: Sent -15 to process 7834 >> ----------------------------- >> >> 2012/12/11 Jan Friesse <jfriesse@xxxxxxxxxx>: >>> Actually main problem is, that kqueue is created BEFORE fork, and >>> according to man page, kqueue is NOT shared between process / child. >>> Patch seems to be pretty easy and I will send it. >>> >>> Honza >>> >>> Stephan napsal(a): >>>> Right, it works for me too when staring in foreground mode. I don´t >>>> know if you have an idea what could cause this. But when running it in >>>> daemon mode, it does apparently close its file descriptor to the >>>> kevent queue somewhere. That does not happen when running in >>>> foreground mode: >>>> >>>> corosync 2371 root 3u KQUEUE 0xfffffe84b41d7980 >>>> >>>> >>>> >>>> Regards, >>>> >>>> Stephan >>>> >>>> _______________________________________________ >>>> discuss mailing list >>>> discuss@xxxxxxxxxxxx >>>> http://lists.corosync.org/mailman/listinfo/discuss >>>> >>> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss