Re: Corosync/Pacemaker on NetBSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stephan napsal(a):
> Okay, thank you so far. Just for your information, it seems that many
> components of pacemaker (cib, crmd, ...) also rely on SHM via Libqb,
> so it would probably be least effort to implement it there. Also I

Actually, not. SHM is using shared posix semaphore (I believe, maybe it
was different feature) so it's impossible to implement it as long as OS
doesn't support that. But actually, correct solution is to use
QB_IPC_NATIVE in all components.

> wonder if this would work on FreeBSD or on Linux only.

As far as I can remember, that feature WAS implemented on FreeBSD since
7.x or 8.x, so SHM worked there (in corosync 1.x) and I believe LibQB
behaves same (so FreeBSD 9.x should be no problem).

Regards,
  Honza

> 
> 
> Regards,
> 
> Stephan
> 
> 2012/12/12 Jan Friesse <jfriesse@xxxxxxxxxx>:
>> Stephan,
>> patch "Move qb_loop creation after daemonization" should fix start in
>> daemon mode. Other questions are really no longer corosync related, so
>> it is probably good idea to try LibQB and/or Pacemaker mailing list.
>>
>> Actually, QB_IPC_SHM is NOT supported by LibQB on NetBSD (only socket
>> is). It's also good idea to use QB_IPC_NATIVE (CC'ing Andrew).
>>
>> But actually, situation is little harder, because QB_IPC_SOCKET deadlock
>> (often and very reproducible, not only on NetBSD, but also on Linux).
>> I've created ticket in github.
>>
>> Regards,
>>   Honza
>>
>> Stephan napsal(a):
>>> lrmd fails here:
>>>
>>> mainloop_add_ipc_server(CRM_SYSTEM_LRMD, QB_IPC_SHM, &lrmd_ipc_callbacks);
>>>
>>>
>>> Calling the following function from /lib/common/mainloop.c
>>> -------8<--------
>>> qb_ipcs_service_t *mainloop_add_ipc_server(
>>>     const char *name, enum qb_ipc_type type, struct
>>> qb_ipcs_service_handlers *callbacks)
>>> {
>>>     int rc = 0;
>>>     qb_ipcs_service_t* server = NULL;
>>>
>>>     if(gio_map == NULL) {
>>>         gio_map = qb_array_create_2(64, sizeof(struct gio_to_qb_poll), 1);
>>>     }
>>>
>>>     server = qb_ipcs_create(name, 0, pick_ipc_type(type), callbacks);
>>>     qb_ipcs_poll_handlers_set(server, &gio_poll_funcs);
>>>
>>>     rc = qb_ipcs_run(server);
>>>     if (rc < 0) {
>>>         crm_err("Could not start %s IPC server: %s (%d)", name,
>>> strerror(rc), rc);
>>>         return NULL;
>>>     }
>>>
>>>     return server;
>>> }
>>>
>>> --------------------------
>>>
>>> I think a shared memory region should be created using libqb. Is this
>>> known to work on BSD systems?
>>>
>>>
>>> 2012/12/11 Stephan <stephanwib@xxxxxxxxxxxxxx>:
>>>> Yes, kqueues are not inherited. I recompiled and installed pacemaker
>>>> 1.1 for corosync 2.x. It doesn´t yet work (I just started pacemakerd..
>>>> I hope this is okay) ... it seems that lrmd is facing the first issue:
>>>>
>>>> lrmd[15312]:    error: mainloop_add_ipc_server: Could not start lrmd
>>>> IPC server: Unknown error: 4294967210 (-86)
>>>>
>>>>
>>>> All messages:
>>>>
>>>> -----8<----------
>>>> Dec 11 14:22:31 ctx4980gate2 pacemakerd[13003]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 20.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: main:
>>>> Starting Pacemaker 1.1.8 (Build: 1f8858c):  ncurses libqb-logging
>>>> libqb-ipc lha-fencing  corosync-native
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 18.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> update_node_processes: 0x7f7ff7b09150 Node 3232235777 now known as
>>>> ctx4980gate2, was:
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 stonith-ng[7834]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice: main: Using new
>>>> config location: /var/lib/pacemaker/cib
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: retrieveCib:
>>>> Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: readCibXmlFile:
>>>> Primary configuration corrupt or unusable, trying backup...
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:  warning: readCibXmlFile:
>>>> Continuing with an empty configuration.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:    error:
>>>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[17349]:    error: main: Couldn't
>>>> start IPC server
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[15312]:    error: main: Failed to
>>>> allocate lrmd server.  shutting down
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process lrmd exited (pid=15312, rc=255)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error:
>>>> qb_ipcs_us_publish: Could not bind AF_UNIX (/var/run/attrd): Address
>>>> already in use (48)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Respawning failed child process: lrmd
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error:
>>>> mainloop_add_ipc_server: Could not start attrd IPC server: Unknown
>>>> error: 4294967248 (-48)
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error: main: Could not
>>>> start IPC server
>>>> Dec 11 14:22:32 ctx4980gate2 attrd[9542]:    error: main: Aborting startup
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 26.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process pengine exited (pid=17349, rc=1)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Respawning failed child process: pengine
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process attrd exited (pid=9542, rc=100)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:  warning:
>>>> pcmk_child_exit: Pacemaker child process attrd no longer wishes to be
>>>> respawned. Shutting ourselves down.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_shutdown_worker: Shuting down Pacemaker
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping crmd: Sent -15 to process 13681
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Child process crmd terminated with signal 15
>>>> (pid=13681, core=0)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping pengine: Sent -15 to process 10446
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:   notice:
>>>> crm_cluster_connect: Connecting to cluster infrastructure: corosync
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: qb_ipcs_us_publish:
>>>> Could not bind AF_UNIX (/var/run/cib_ro): Permission denied (13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>>>> mainloop_add_ipc_server: Could not start cib_ro IPC server: Unknown
>>>> error: 4294967283 (-13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: qb_ipcs_us_publish:
>>>> Could not bind AF_UNIX (/var/run/cib_rw): Permission denied (13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>>>> mainloop_add_ipc_server: Could not start cib_rw IPC server: Unknown
>>>> error: 4294967283 (-13)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error:
>>>> mainloop_add_ipc_server: Could not start cib_shm IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 cib[13836]:    error: cib_init: Couldnt
>>>> start all IPC channels, exiting.
>>>> Dec 11 14:22:32 ctx4980gate2 corosync[18423]:   [QB    ] got EV_EOF on fd 26.
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process cib exited (pid=13836, rc=255)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:   notice: crm_add_logfile:
>>>> Additional logging available in /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:     info:
>>>> crm_update_callsites: Enabling callsites based on priority=6,
>>>> files=(null), functions=(null), formats=(null), tags=(null)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:   notice:
>>>> crm_add_logfile: Additional logging available in
>>>> /var/log/cluster/corosync.log
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:    error:
>>>> mainloop_add_ipc_server: Could not start pengine IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 pengine[10446]:    error: main: Couldn't
>>>> start IPC server
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error:
>>>> mainloop_add_ipc_server: Could not start lrmd IPC server: Unknown
>>>> error: 4294967210 (-86)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:    error:
>>>> pcmk_child_exit: Child process pengine exited (pid=10446, rc=1)
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: try_server_create:
>>>> New IPC server could not be created because another lrmd process
>>>> exists, sending shutdown command to old lrmd process.
>>>> Dec 11 14:22:32 ctx4980gate2 lrmd[22677]:    error: main: Failed to
>>>> allocate lrmd server.  shutting down
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping lrmd: Sent -15 to process 22677
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice:
>>>> pcmk_child_exit: Child process lrmd terminated with signal 15
>>>> (pid=22677, core=0)
>>>> Dec 11 14:22:32 ctx4980gate2 pacemakerd[13003]:   notice: stop_child:
>>>> Stopping stonith-ng: Sent -15 to process 7834
>>>> -----------------------------
>>>>
>>>> 2012/12/11 Jan Friesse <jfriesse@xxxxxxxxxx>:
>>>>> Actually main problem is, that kqueue is created BEFORE fork, and
>>>>> according to man page, kqueue is NOT shared between process / child.
>>>>> Patch seems to be pretty easy and I will send it.
>>>>>
>>>>> Honza
>>>>>
>>>>> Stephan napsal(a):
>>>>>> Right, it works for me too when staring in foreground mode. I don´t
>>>>>> know if you have an idea what could cause this. But when running it in
>>>>>> daemon mode, it does apparently close its file descriptor to the
>>>>>> kevent queue somewhere. That does not happen when running in
>>>>>> foreground mode:
>>>>>>
>>>>>> corosync 2371 root    3u  KQUEUE 0xfffffe84b41d7980
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Stephan
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> discuss@xxxxxxxxxxxx
>>>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@xxxxxxxxxxxx
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>
>>
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux