Re: Corosync/Pacemaker on NetBSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephan,
patch "Move qb_loop creation after daemonization" should fix start in
daemon mode. Other questions are really no longer corosync related, so
it is probably good idea to try LibQB and/or Pacemaker mailing list.

Actually, QB_IPC_SHM is NOT supported by LibQB on NetBSD (only socket
is). It's also good idea to use QB_IPC_NATIVE (CC'ing Andrew).

But actually, situation is little harder, because QB_IPC_SOCKET deadlock
(often and very reproducible, not only on NetBSD, but also on Linux).
I've created ticket in github.

Regards,
  Honza

Stephan napsal(a):
> lrmd fails here:
> 
> mainloop_add_ipc_server(CRM_SYSTEM_LRMD, QB_IPC_SHM, &lrmd_ipc_callbacks);
> 
> 
> Calling the following function from /lib/common/mainloop.c
> -------8<--------
> qb_ipcs_service_t *mainloop_add_ipc_server(
>     const char *name, enum qb_ipc_type type, struct
> qb_ipcs_service_handlers *callbacks)
> {
>     int rc = 0;
>     qb_ipcs_service_t* server = NULL;
> 
>     if(gio_map == NULL) {
>         gio_map = qb_array_create_2(64, sizeof(struct gio_to_qb_poll), 1);
>     }
> 
>     server = qb_ipcs_create(name, 0, pick_ipc_type(type), callbacks);
>     qb_ipcs_poll_handlers_set(server, &gio_poll_funcs);
> 
>     rc = qb_ipcs_run(server);
>     if (rc < 0) {
>         crm_err("Could not start %s IPC server: %s (%d)", name,
> strerror(rc), rc);
>         return NULL;
>     }
> 
>     return server;
> }
> 
> --------------------------
> 
> I think a shared memory region should be created using libqb. Is this
> known to work on BSD systems?
> 
> 
> 2012/12/11 Stephan <stephanwib@xxxxxxxxxxxxxx>:
>> Yes, kqueues are not inherited. I recompiled and installed pacemaker
>> 1.1 for corosync 2.x. It doesn´t yet work (I just started pacemakerd..
>> I hope this is okay) ... it seems that lrmd is facing the first issue:
>>
>> lrmd[15312]:    error: mainloop_add_ipc_server: Could not start lrmd
>> IPC server: Unknown error: 4294967210 (-86)
>>
>>
>> All messages:
>>
>> -----8<----------
>> Dec 11 14:22:31 ctx4980gate2 pacemakerd[13003]:     info:
>> crm_update_callsites: Enabling callsites based on priority=6,
>> files=(null), functions=(null), formats=(null), tags=(null)
>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 20.
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> crm_add_logfile: Additional logging available in
>> /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: main:
>> Starting Pacemaker 1.1.8 (Build: 1f8858c):  ncurses libqb-logging
>> libqb-ipc lha-fencing  corosync-native
>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 18.
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> update_node_processes: 0x7f7ff7b09150 Node 3232235777 now known as
>> ctx4980gate2, was:
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice: crm_add_logfile:
>> Additional logging available in /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:     info:
>> crm_update_callsites: Enabling callsites based on priority=6,
>> files=(null), functions=(null), formats=(null), tags=(null)
>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:   notice:
>> crm_add_logfile: Additional logging available in
>> /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:     info:
>> crm_update_callsites: Enabling callsites based on priority=6,
>> files=(null), functions=(null), formats=(null), tags=(null)
>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:   notice:
>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice: main: Using new
>> config location: /var/lib/pacemaker/cib
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: retrieveCib:
>> Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: readCibXmlFile:
>> Primary configuration corrupt or unusable, trying backup...
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: readCibXmlFile:
>> Continuing with an empty configuration.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:   notice: crm_add_logfile:
>> Additional logging available in /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:     info:
>> crm_update_callsites: Enabling callsites based on priority=6,
>> files=(null), functions=(null), formats=(null), tags=(null)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:   notice: crm_add_logfile:
>> Additional logging available in /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:   notice:
>> crm_add_logfile: Additional logging available in
>> /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:   notice:
>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:    error:
>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:    error: main: Couldn't
>> start IPC server
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: main: Failed to
>> allocate lrmd server.  shutting down
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>> pcmk_child_exit: Child process lrmd exited (pid=15312, rc=255)
>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error:
>> qb_ipcs_us_publish: Could not bind AF_UNIX (/var/run/attrd): Address
>> already in use (48)
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> pcmk_child_exit: Respawning failed child process: lrmd
>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error:
>> mainloop_add_ipc_server: Could not start attrd IPC server: Unknown
>> error: 4294967248 (-48)
>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error: main: Could not
>> start IPC server
>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error: main: Aborting startup
>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 26.
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>> pcmk_child_exit: Child process pengine exited (pid=17349, rc=1)
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> pcmk_child_exit: Respawning failed child process: pengine
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>> pcmk_child_exit: Child process attrd exited (pid=9542, rc=100)
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:  warning:
>> pcmk_child_exit: Pacemaker child process attrd no longer wishes to be
>> respawned. Shutting ourselves down.
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> pcmk_shutdown_worker: Shuting down Pacemaker
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>> Stopping crmd: Sent -15 to process 13681
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> pcmk_child_exit: Child process crmd terminated with signal 15
>> (pid=13681, core=0)
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>> Stopping pengine: Sent -15 to process 10446
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice:
>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: qb_ipcs_us_publish:
>> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13)
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>> mainloop_add_ipc_server: Could not start cib_ro IPC server: Unknown
>> error: 4294967283 (-13)
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: qb_ipcs_us_publish:
>> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13)
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>> mainloop_add_ipc_server: Could not start cib_rw IPC server: Unknown
>> error: 4294967283 (-13)
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>> mainloop_add_ipc_server: Could not start cib_shm IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: cib_init: Couldnt
>> start all IPC channels, exiting.
>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 26.
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>> pcmk_child_exit: Child process cib exited (pid=13836, rc=255)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:   notice: crm_add_logfile:
>> Additional logging available in /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:     info:
>> crm_update_callsites: Enabling callsites based on priority=6,
>> files=(null), functions=(null), formats=(null), tags=(null)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:   notice:
>> crm_add_logfile: Additional logging available in
>> /var/log/cluster/corosync.log
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:    error:
>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:    error: main: Couldn't
>> start IPC server
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>> error: 4294967210 (-86)
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>> pcmk_child_exit: Child process pengine exited (pid=10446, rc=1)
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>> New IPC server could not be created because another lrmd process
>> exists, sending shutdown command to old lrmd process.
>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: main: Failed to
>> allocate lrmd server.  shutting down
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>> Stopping lrmd: Sent -15 to process 22677
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>> pcmk_child_exit: Child process lrmd terminated with signal 15
>> (pid=22677, core=0)
>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>> Stopping stonith-ng: Sent -15 to process 7834
>> -----------------------------
>>
>> 2012/12/11 Jan Friesse <jfriesse@xxxxxxxxxx>:
>>> Actually main problem is, that kqueue is created BEFORE fork, and
>>> according to man page, kqueue is NOT shared between process / child.
>>> Patch seems to be pretty easy and I will send it.
>>>
>>> Honza
>>>
>>> Stephan napsal(a):
>>>> Right, it works for me too when staring in foreground mode. I don´t
>>>> know if you have an idea what could cause this. But when running it in
>>>> daemon mode, it does apparently close its file descriptor to the
>>>> kevent queue somewhere. That does not happen when running in
>>>> foreground mode:
>>>>
>>>> corosync 2371 root    3u  KQUEUE 0xfffffe84b41d7980
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Stephan
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss@xxxxxxxxxxxx
>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>
>>>
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux