Re: Problems adding a new host via orchestration.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just FYI, I've seen this on CentOS systems as well, and I'm not even
sure that it was just for Ceph. Maybe some stuff like Ansible.

I THINK you can safely ignore that message or alternatively that it's
such an easy fix that senility has already driven it from my mind.

    Tim

On Tue, 2024-02-06 at 14:44 -0500, Gary Molenkamp wrote:
> I confirmed selinux is disabled on all existing and new hosts.
> Likewise, 
> python3.6 is installed on all as well.  (3.9.16 on RL8, 3.9.18 on
> RL9).
> 
> I am running 16.2.12 on all containers, so it may be worth updating
> to 
> 16.2.14 to ensure I'm on the latest Pacific release.
> 
> Gary
> 
> 
> On 2024-02-05 08:17, Curt wrote:
> > 
> >         
> > You don't often get email from lightspd@xxxxxxxxx. Learn why this
> > is 
> > important <https://aka.ms/LearnAboutSenderIdentification>
> >         
> > 
> > I don't use rocky, so stab in the dark and probably not the issue,
> > but 
> > could selinux be blocking the process?  Really long shot, but
> > python3 
> > is in the standard location? So if you run python3 --version as
> > your 
> > ceph user it returns?
> > 
> > Probably not much help, but figured I'd throw it out there.
> > 
> > On Mon, 5 Feb 2024, 16:54 Gary Molenkamp, <molenkam@xxxxxx> wrote:
> > 
> >     I have verified the server's expected hostname (with
> > `hostname`)
> >     matches
> >     the hostname I am trying to use.
> >     Just to be sure, I also ran:
> >          cephadm check-host --expect-hostname <hostname>
> >     and it returns:
> >          Hostname "<hostname>" matches what is expected.
> > 
> >     On the current admin server where I am trying to add the host,
> > the
> >     host
> >     is reachable, the shortname even matches proper IP with dns
> > search
> >     order.
> >     Likewise, on the server where the mgr is running, I am able to
> >     confirm
> >     reachability and DNS resolution for the new server as well.
> > 
> >     I thought this may be a DNS/name resolution issue as well, but
> > I
> >     don't
> >     see any errors in my setup wrt to host naming.
> > 
> >     Thanks
> >     Gary
> > 
> > 
> >     On 2024-02-03 06:46, Eugen Block wrote:
> >     > Hi,
> >     >
> >     > I found this blog post [1] which reports the same error
> > message. It
> >     > seems a bit misleading because it appears to be about DNS.
> > Can
> >     you check
> >     >
> >     > cephadm check-host --expect-hostname <HOSTNAME>
> >     >
> >     > Or is that what you already tried? It's not entirely clear
> > how you
> >     > checked the hostname.
> >     >
> >     > Regards,
> >     > Eugen
> >     >
> >     > [1]
> >     >
> >    
> > https://blog.mousetech.com/ceph-distributed-file-system-for-the-enterprise/ceph-bogus-error-cannot-allocate-memory/
> >     >
> >     > Zitat von Gary Molenkamp <molenkam@xxxxxx>:
> >     >
> >     >> Happy Friday all.  I was hoping someone could point me in
> > the
> >     right
> >     >> direction or clarify any limitations that could be impacting
> > an
> >     issue
> >     >> I am having.
> >     >>
> >     >> I'm struggling to add a new set of hosts to my ceph cluster
> > using
> >     >> cephadm and orchestration.  When trying to add a host:
> >     >>     "ceph orch host add <hostname> 172.31.102.41 --labels
> > _admin"
> >     >> returns:
> >     >>     "Error EINVAL: Can't communicate with remote host
> >     >> `172.31.102.41`, possibly because python3 is not installed
> > there:
> >     >> [Errno 12] Cannot allocate memory"
> >     >>
> >     >> I've verified that the ceph ssh key works to the remote
> > host,
> >     host's
> >     >> name matches that returned from `hostname`, python3 is
> >     installed, and
> >     >> "/usr/sbin/cephadm prepare-host" on the new hosts returns
> > "host is
> >     >> ok".    In addition, the cluster ssh key works between hosts
> >     and the
> >     >> existing hosts are able to ssh in using the ceph key.
> >     >>
> >     >> The existing ceph cluster is Pacific release using docker
> > based
> >     >> containerization on RockyLinux8 base OS.  The new hosts are
> >     >> RockyLinux9 based, with the cephadm being installed from
> > Quincy
> >     release:
> >     >>         ./cephadm add-repo --release quincy
> >     >>         ./cephadm install
> >     >> I did try installing cephadm from the Pacific release by
> >     changing the
> >     >> repo to el8,  but that did not work either.
> >     >>
> >     >> Is there a limitation is mixing RL8 and RL9 container hosts
> > under
> >     >> Pacific?  Does this same limitation exist under Quincy? Is
> > there a
> >     >> python version dependency?
> >     >> The reason for RL9 on the new hosts is to stage upgrading
> > the OS's
> >     >> for the cluster.  I did this under Octopus for moving from
> >     Centos7 to
> >     >> RL8.
> >     >>
> >     >> Thanks and I appreciate any feedback/pointers.
> >     >> Gary
> >     >>
> >     >>
> >     >> I've added the log trace here in case that helps (from `ceph
> >     log last
> >     >> cephadm`)
> >     >>
> >     >>
> >     >>
> >     >> 2024-02-02T14:22:32.610048+0000 mgr.storage01.oonvfl
> >     (mgr.441023307)
> >     >> 4957871 : cephadm [ERR] Can't communicate with remote host
> >     >> `172.31.102.41`, possibly because python3 is not installed
> > there:
> >     >> [Errno 12] Cannot allocate memory
> >     >> Traceback (most recent call last):
> >     >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1524, in
> >     >> _remote_connection
> >     >>     conn, connr = self.mgr._get_connection(addr)
> >     >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1370,
> > in
> >     >> _get_connection
> >     >>     sudo=True if self.ssh_user != 'root' else False)
> >     >>   File "/lib/python3.6/site-
> > packages/remoto/backends/__init__.py",
> >     >> line 35, in __init__
> >     >>     self.gateway = self._make_gateway(hostname)
> >     >>   File "/lib/python3.6/site-
> > packages/remoto/backends/__init__.py",
> >     >> line 46, in _make_gateway
> >     >>     self._make_connection_string(hostname)
> >     >>   File "/lib/python3.6/site-packages/execnet/multi.py", line
> >     133, in
> >     >> makegateway
> >     >>     io = gateway_io.create_io(spec,
> > execmodel=self.execmodel)
> >     >>   File "/lib/python3.6/site-packages/execnet/gateway_io.py",
> > line
> >     >> 121, in create_io
> >     >>     io = Popen2IOMaster(args, execmodel)
> >     >>   File "/lib/python3.6/site-packages/execnet/gateway_io.py",
> >     line 21,
> >     >> in __init__
> >     >>     self.popen = p = execmodel.PopenPiped(args)
> >     >>   File "/lib/python3.6/site-
> > packages/execnet/gateway_base.py",
> >     line
> >     >> 184, in PopenPiped
> >     >>     return self.subprocess.Popen(args, stdout=PIPE,
> > stdin=PIPE)
> >     >>   File "/lib64/python3.6/subprocess.py", line 729, in
> > __init__
> >     >>     restore_signals, start_new_session)
> >     >>   File "/lib64/python3.6/subprocess.py", line 1295, in
> >     _execute_child
> >     >>     restore_signals, start_new_session, preexec_fn)
> >     >> OSError: [Errno 12] Cannot allocate memory
> >     >>
> >     >> During handling of the above exception, another exception
> > occurred:
> >     >>
> >     >> Traceback (most recent call last):
> >     >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1528, in
> >     >> _remote_connection
> >     >>     raise execnet.gateway_bootstrap.HostNotFound(msg)
> >     >> execnet.gateway_bootstrap.HostNotFound: Can't communicate
> > with
> >     remote
> >     >> host `172.31.102.41`, possibly because python3 is not
> > installed
> >     >> there: [Errno 12] Cannot allocate memory
> >     >>
> >     >> The above exception was the direct cause of the following
> >     exception:
> >     >>
> >     >> Traceback (most recent call last):
> >     >>   File "/usr/share/ceph/mgr/orchestrator/_interface.py",
> > line
> >     125, in
> >     >> wrapper
> >     >>     return OrchResult(f(*args, **kwargs))
> >     >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2709,
> > in apply
> >     >>     results.append(self._apply(spec))
> >     >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2574,
> > in
> >     _apply
> >     >>     return self._add_host(cast(HostSpec, spec))
> >     >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1517,
> > in
> >     _add_host
> >     >>     ip_addr = self._check_valid_addr(spec.hostname,
> > spec.addr)
> >     >>   File "/usr/share/ceph/mgr/cephadm/module.py", line 1498,
> > in
> >     >> _check_valid_addr
> >     >>     error_ok=True, no_fsid=True)
> >     >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1326, in
> >     >> _run_cephadm
> >     >>     with self._remote_connection(host, addr) as tpl:
> >     >>   File "/lib64/python3.6/contextlib.py", line 81, in
> > __enter__
> >     >>     return next(self.gen)
> >     >>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1558, in
> >     >> _remote_connection
> >     >>     raise OrchestratorError(msg) from e
> >     >> orchestrator._interface.OrchestratorError: Can't communicate
> > with
> >     >> remote host `172.31.102.41`, possibly because python3 is not
> >     >> installed there: [Errno 12] Cannot allocate memory
> >     >>
> >     >>
> >     >>
> >     >>
> >     >> --
> >     >> Gary Molenkamp            Science Technology Services
> >     >> Systems Engineer        University of Western Ontario
> >     >> molenkam@xxxxxx http://sts.sci.uwo.ca
> >     >> (519) 661-2111 x86882        (519) 661-3566
> >     >> _______________________________________________
> >     >> ceph-users mailing list -- ceph-users@xxxxxxx
> >     >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >     >
> >     >
> >     > _______________________________________________
> >     > ceph-users mailing list -- ceph-users@xxxxxxx
> >     > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > 
> >     -- 
> >     Gary Molenkamp                  Science Technology Services
> >     Systems Engineer                University of Western Ontario
> >     molenkam@xxxxxx http://sts.sci.uwo.ca
> >     (519) 661-2111 x86882           (519) 661-3566
> >     _______________________________________________
> >     ceph-users mailing list -- ceph-users@xxxxxxx
> >     To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > 
> 
> -- 
> Gary Molenkamp                  Science Technology Services
> Systems Engineer                University of Western Ontario
> molenkam@xxxxxx                  http://sts.sci.uwo.ca
> (519) 661-2111 x86882           (519) 661-3566
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux