Re: Corosync/Pacemaker on NetBSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Now this is a breakthrough:

==========
Last updated: Wed Dec 12 12:54:33 2012
Last change: Wed Dec 12 12:52:58 2012 via crmd on ctx4980gate2
Stack: corosync
Current DC: ctx4980gate2 (3232235777) - partition WITHOUT quorum
Version: 1.1.8-1f8858c
1 Nodes configured, unknown expected votes
0 Resources configured.


Online: [ ctx4980gate2 ]

============


Pacemaker 1.1 is finally working (at least basically :) on top of
corosync 2.1 on NetBSD. Thank you so much!!

Regards,

Stephan

2012/12/12 Andrew Beekhof <abeekhof@xxxxxxxxxx>:
>
> On 12/12/2012, at 7:53 PM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote:
>
>> Stephan,
>> patch "Move qb_loop creation after daemonization" should fix start in
>> daemon mode. Other questions are really no longer corosync related, so
>> it is probably good idea to try LibQB and/or Pacemaker mailing list.
>>
>> Actually, QB_IPC_SHM is NOT supported by LibQB on NetBSD (only socket
>> is). It's also good idea to use QB_IPC_NATIVE (CC'ing Andrew).
>
> Thats easily changed at runtime with an environment variable.
> look for mcp/pacemaker.sysconfig in the source tree
>
>>
>> But actually, situation is little harder, because QB_IPC_SOCKET deadlock
>> (often and very reproducible, not only on NetBSD, but also on Linux).
>> I've created ticket in github.
>>
>> Regards,
>>  Honza
>>
>> Stephan napsal(a):
>>> lrmd fails here:
>>>
>>> mainloop_add_ipc_server(CRM_SYSTEM_LRMD, QB_IPC_SHM, &lrmd_ipc_callbacks);
>>>
>>>
>>> Calling the following function from /lib/common/mainloop.c
>>> -------8<--------
>>> qb_ipcs_service_t *mainloop_add_ipc_server(
>>>    const char *name, enum qb_ipc_type type, struct
>>> qb_ipcs_service_handlers *callbacks)
>>> {
>>>    int rc = 0;
>>>    qb_ipcs_service_t* server = NULL;
>>>
>>>    if(gio_map == NULL) {
>>>        gio_map = qb_array_create_2(64, sizeof(struct gio_to_qb_poll), 1);
>>>    }
>>>
>>>    server = qb_ipcs_create(name, 0, pick_ipc_type(type), callbacks);
>>>    qb_ipcs_poll_handlers_set(server, &gio_poll_funcs);
>>>
>>>    rc = qb_ipcs_run(server);
>>>    if (rc < 0) {
>>>        crm_err("Could not start %s IPC server: %s (%d)", name,
>>> strerror(rc), rc);
>>>        return NULL;
>>>    }
>>>
>>>    return server;
>>> }
>>>
>>> --------------------------
>>>
>>> I think a shared memory region should be created using libqb. Is this
>>> known to work on BSD systems?
>>>
>>>
>>> 2012/12/11 Stephan <stephanwib@xxxxxxxxxxxxxx>:
>>>> Yes, kqueues are not inherited. I recompiled and installed pacemaker
>>>> 1.1 for corosync 2.x. It doesn´t yet work (I just started pacemakerd..
>>>> I hope this is okay) ... it seems that lrmd is facing the first issue:
>>>>
>>>> lrmd[15312]:    error: mainloop_add_ipc_server: Could not start lrmd
>>>> IPC server: Unknown error: 4294967210 (-86)
>>>>
>>>>
>>>> All messages:
>>>>
>>>> -----8<----------
>>>> Dec 11 14:22:31 ctx4980gate2 pacemakerd[13003]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 20.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: main:
>>>> Starting Pacemaker 1.1.8 (Build: 1f8858c):  ncurses libqb-logging
>>>> libqb-ipc lha-fencing  corosync-native
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 18.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> update_node_processes: 0x7f7ff7b09150 Node 3232235777 now known as
>>>> ctx4980gate2, was:
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice: main: Using new
>>>> config location: /var/lib/pacemaker/cib
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: retrieveCib:
>>>> Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: readCibXmlFile:
>>>> Primary configuration corrupt or unusable, trying backup...
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: readCibXmlFile:
>>>> Continuing with an empty configuration.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:    error:
>>>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:    error: main: Couldn't
>>>> start IPC server
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: main: Failed to
>>>> allocate lrmd server.  shutting down
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process lrmd exited (pid=15312, rc=255)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error:
>>>> qb_ipcs_us_publish: Could not bind AF_UNIX (/var/run/attrd): Address
>>>> already in use (48)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Respawning failed child process: lrmd
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error:
>>>> mainloop_add_ipc_server: Could not start attrd IPC server: Unknown
>>>> error: 4294967248 (-48)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error: main: Could not
>>>> start IPC server
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error: main: Aborting startup
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 26.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process pengine exited (pid=17349, rc=1)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Respawning failed child process: pengine
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process attrd exited (pid=9542, rc=100)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:  warning:
>>>> pcmk_child_exit: Pacemaker child process attrd no longer wishes to be
>>>> respawned. Shutting ourselves down.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_shutdown_worker: Shuting down Pacemaker
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping crmd: Sent -15 to process 13681
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Child process crmd terminated with signal 15
>>>> (pid=13681, core=0)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping pengine: Sent -15 to process 10446
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: qb_ipcs_us_publish:
>>>> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>>>> mainloop_add_ipc_server: Could not start cib_ro IPC server: Unknown
>>>> error: 4294967283 (-13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: qb_ipcs_us_publish:
>>>> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>>>> mainloop_add_ipc_server: Could not start cib_rw IPC server: Unknown
>>>> error: 4294967283 (-13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>>>> mainloop_add_ipc_server: Could not start cib_shm IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: cib_init: Couldnt
>>>> start all IPC channels, exiting.
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 26.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process cib exited (pid=13836, rc=255)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:    error:
>>>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:    error: main: Couldn't
>>>> start IPC server
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process pengine exited (pid=10446, rc=1)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: main: Failed to
>>>> allocate lrmd server.  shutting down
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping lrmd: Sent -15 to process 22677
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Child process lrmd terminated with signal 15
>>>> (pid=22677, core=0)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping stonith-ng: Sent -15 to process 7834
>>>> -----------------------------
>>>>
>>>> 2012/12/11 Jan Friesse <jfriesse@xxxxxxxxxx>:
>>>>> Actually main problem is, that kqueue is created BEFORE fork, and
>>>>> according to man page, kqueue is NOT shared between process / child.
>>>>> Patch seems to be pretty easy and I will send it.
>>>>>
>>>>> Honza
>>>>>
>>>>> Stephan napsal(a):
>>>>>> Right, it works for me too when staring in foreground mode. I don´t
>>>>>> know if you have an idea what could cause this. But when running it in
>>>>>> daemon mode, it does apparently close its file descriptor to the
>>>>>> kevent queue somewhere. That does not happen when running in
>>>>>> foreground mode:
>>>>>>
>>>>>> corosync 2371 root    3u  KQUEUE 0xfffffe84b41d7980
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Stephan
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> discuss@xxxxxxxxxxxx
>>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@xxxxxxxxxxxx
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>
>>
>

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux