Now this is a breakthrough: ========== Last updated: Wed Dec 12 12:54:33 2012 Last change: Wed Dec 12 12:52:58 2012 via crmd on ctx4980gate2 Stack: corosync Current DC: ctx4980gate2 (3232235777) - partition WITHOUT quorum Version: 1.1.8-1f8858c 1 Nodes configured, unknown expected votes 0 Resources configured. Online: [ ctx4980gate2 ] ============ Pacemaker 1.1 is finally working (at least basically :) on top of corosync 2.1 on NetBSD. Thank you so much!! Regards, Stephan 2012/12/12 Andrew Beekhof <abeekhof@xxxxxxxxxx>: > > On 12/12/2012, at 7:53 PM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote: > >> Stephan, >> patch "Move qb_loop creation after daemonization" should fix start in >> daemon mode. Other questions are really no longer corosync related, so >> it is probably good idea to try LibQB and/or Pacemaker mailing list. >> >> Actually, QB_IPC_SHM is NOT supported by LibQB on NetBSD (only socket >> is). It's also good idea to use QB_IPC_NATIVE (CC'ing Andrew). > > Thats easily changed at runtime with an environment variable. > look for mcp/pacemaker.sysconfig in the source tree > >> >> But actually, situation is little harder, because QB_IPC_SOCKET deadlock >> (often and very reproducible, not only on NetBSD, but also on Linux). >> I've created ticket in github. >> >> Regards, >> Honza >> >> Stephan napsal(a): >>> lrmd fails here: >>> >>> mainloop_add_ipc_server(CRM_SYSTEM_LRMD, QB_IPC_SHM, &lrmd_ipc_callbacks); >>> >>> >>> Calling the following function from /lib/common/mainloop.c >>> -------8<-------- >>> qb_ipcs_service_t *mainloop_add_ipc_server( >>> const char *name, enum qb_ipc_type type, struct >>> qb_ipcs_service_handlers *callbacks) >>> { >>> int rc = 0; >>> qb_ipcs_service_t* server = NULL; >>> >>> if(gio_map == NULL) { >>> gio_map = qb_array_create_2(64, sizeof(struct gio_to_qb_poll), 1); >>> } >>> >>> server = qb_ipcs_create(name, 0, pick_ipc_type(type), callbacks); >>> qb_ipcs_poll_handlers_set(server, &gio_poll_funcs); >>> >>> rc = qb_ipcs_run(server); >>> if (rc < 0) { >>> crm_err("Could not start %s IPC server: %s (%d)", name, >>> strerror(rc), rc); >>> return NULL; >>> } >>> >>> return server; >>> } >>> >>> -------------------------- >>> >>> I think a shared memory region should be created using libqb. Is this >>> known to work on BSD systems? >>> >>> >>> 2012/12/11 Stephan <stephanwib@xxxxxxxxxxxxxx>: >>>> Yes, kqueues are not inherited. I recompiled and installed pacemaker >>>> 1.1 for corosync 2.x. It doesn´t yet work (I just started pacemakerd.. >>>> I hope this is okay) ... it seems that lrmd is facing the first issue: >>>> >>>> lrmd[15312]: error: mainloop_add_ipc_server: Could not start lrmd >>>> IPC server: Unknown error: 4294967210 (-86) >>>> >>>> >>>> All messages: >>>> >>>> -----8<---------- >>>> Dec 11 14:22:31 ctx4980gate2 pacemakerd[13003]: info: >>>> crm_update_callsites: Enabling callsites based on priority=6, >>>> files=(null), functions=(null), formats=(null), tags=(null) >>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 20. >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> crm_add_logfile: Additional logging available in >>>> /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: main: >>>> Starting Pacemaker 1.1.8 (Build: 1f8858c): ncurses libqb-logging >>>> libqb-ipc lha-fencing corosync-native >>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 18. >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> update_node_processes: 0x7f7ff7b09150 Node 3232235777 now known as >>>> ctx4980gate2, was: >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: notice: crm_add_logfile: >>>> Additional logging available in /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: info: >>>> crm_update_callsites: Enabling callsites based on priority=6, >>>> files=(null), functions=(null), formats=(null), tags=(null) >>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]: notice: >>>> crm_add_logfile: Additional logging available in >>>> /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]: info: >>>> crm_update_callsites: Enabling callsites based on priority=6, >>>> files=(null), functions=(null), formats=(null), tags=(null) >>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]: notice: >>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: notice: main: Using new >>>> config location: /var/lib/pacemaker/cib >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: warning: retrieveCib: >>>> Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: warning: readCibXmlFile: >>>> Primary configuration corrupt or unusable, trying backup... >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: warning: readCibXmlFile: >>>> Continuing with an empty configuration. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: notice: crm_add_logfile: >>>> Additional logging available in /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: info: >>>> crm_update_callsites: Enabling callsites based on priority=6, >>>> files=(null), functions=(null), formats=(null), tags=(null) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: notice: crm_add_logfile: >>>> Additional logging available in /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]: notice: >>>> crm_add_logfile: Additional logging available in >>>> /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: notice: >>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]: error: >>>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]: error: main: Couldn't >>>> start IPC server >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]: error: main: Failed to >>>> allocate lrmd server. shutting down >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >>>> pcmk_child_exit: Child process lrmd exited (pid=15312, rc=255) >>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: >>>> qb_ipcs_us_publish: Could not bind AF_UNIX (/var/run/attrd): Address >>>> already in use (48) >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> pcmk_child_exit: Respawning failed child process: lrmd >>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: >>>> mainloop_add_ipc_server: Could not start attrd IPC server: Unknown >>>> error: 4294967248 (-48) >>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: main: Could not >>>> start IPC server >>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]: error: main: Aborting startup >>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 26. >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >>>> pcmk_child_exit: Child process pengine exited (pid=17349, rc=1) >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> pcmk_child_exit: Respawning failed child process: pengine >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >>>> pcmk_child_exit: Child process attrd exited (pid=9542, rc=100) >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: warning: >>>> pcmk_child_exit: Pacemaker child process attrd no longer wishes to be >>>> respawned. Shutting ourselves down. >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> pcmk_shutdown_worker: Shuting down Pacemaker >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >>>> Stopping crmd: Sent -15 to process 13681 >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> pcmk_child_exit: Child process crmd terminated with signal 15 >>>> (pid=13681, core=0) >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >>>> Stopping pengine: Sent -15 to process 10446 >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: notice: >>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: qb_ipcs_us_publish: >>>> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13) >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: >>>> mainloop_add_ipc_server: Could not start cib_ro IPC server: Unknown >>>> error: 4294967283 (-13) >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: qb_ipcs_us_publish: >>>> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13) >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: >>>> mainloop_add_ipc_server: Could not start cib_rw IPC server: Unknown >>>> error: 4294967283 (-13) >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: >>>> mainloop_add_ipc_server: Could not start cib_shm IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]: error: cib_init: Couldnt >>>> start all IPC channels, exiting. >>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]: [QB ] got EV_EOF on fd 26. >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >>>> pcmk_child_exit: Child process cib exited (pid=13836, rc=255) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: notice: crm_add_logfile: >>>> Additional logging available in /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: info: >>>> crm_update_callsites: Enabling callsites based on priority=6, >>>> files=(null), functions=(null), formats=(null), tags=(null) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]: notice: >>>> crm_add_logfile: Additional logging available in >>>> /var/log/cluster/corosync.log >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]: error: >>>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]: error: main: Couldn't >>>> start IPC server >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: >>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown >>>> error: 4294967210 (-86) >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: error: >>>> pcmk_child_exit: Child process pengine exited (pid=10446, rc=1) >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: try_server_create: >>>> New IPC server could not be created because another lrmd process >>>> exists, sending shutdown command to old lrmd process. >>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]: error: main: Failed to >>>> allocate lrmd server. shutting down >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >>>> Stopping lrmd: Sent -15 to process 22677 >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: >>>> pcmk_child_exit: Child process lrmd terminated with signal 15 >>>> (pid=22677, core=0) >>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]: notice: stop_child: >>>> Stopping stonith-ng: Sent -15 to process 7834 >>>> ----------------------------- >>>> >>>> 2012/12/11 Jan Friesse <jfriesse@xxxxxxxxxx>: >>>>> Actually main problem is, that kqueue is created BEFORE fork, and >>>>> according to man page, kqueue is NOT shared between process / child. >>>>> Patch seems to be pretty easy and I will send it. >>>>> >>>>> Honza >>>>> >>>>> Stephan napsal(a): >>>>>> Right, it works for me too when staring in foreground mode. I don´t >>>>>> know if you have an idea what could cause this. But when running it in >>>>>> daemon mode, it does apparently close its file descriptor to the >>>>>> kevent queue somewhere. That does not happen when running in >>>>>> foreground mode: >>>>>> >>>>>> corosync 2371 root 3u KQUEUE 0xfffffe84b41d7980 >>>>>> >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Stephan >>>>>> >>>>>> _______________________________________________ >>>>>> discuss mailing list >>>>>> discuss@xxxxxxxxxxxx >>>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>>> >>>>> >>> >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >>> >> > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss