Re: Random issues with Reef

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It seems like the default value for mgr/cephadm/container_image_base has been changed from quay.io/ceph/ceph to quay.io/ceph/ceph:v18 between Quincy and Reef. So a quick fix would be to set it to the previous default:

ceph config set mgr mgr/cephadm/container_image_base quay.io/ceph/ceph

Apparently, this is the responsible change:

# Quincy
grep "DEFAULT_IMAGE\ =" /usr/share/ceph/mgr/cephadm/module.py
DEFAULT_IMAGE = 'quay.io/ceph/ceph'

# Reef
grep "DEFAULT_IMAGE\ =" /usr/share/ceph/mgr/cephadm/module.py
DEFAULT_IMAGE = 'quay.io/ceph/ceph:v18'

Although the description clearly states:

desc='Container image name, without the tag',

I created a tracker issue for that:
https://tracker.ceph.com/issues/63150

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

either the cephadm version installed on the host should be updated as well so it matches the cluster version or you can also use the one that the orchestrator uses which stores its different versions in this path (@Mykola thanks again for pointing that out), the latest matches the current ceph version:

/var/lib/ceph/${fsid}/cephadm.*

If you set the executable bit you can use it as usual:

# pacific package version
$ rpm -qf /usr/sbin/cephadm
cephadm-16.2.11.65+g8b7e6fc0182-lp154.3872.1.noarch

$ chmod +x /var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/cephadm.7dcbd4aab60af3e83970c60d4a8a2cc6ea7b997ecc2f4de0a47eeacbb88dde46

$ python3 /var/lib/ceph/201a2fbc-ce7b-44a3-9ed7-39427972083b/cephadm.7dcbd4aab60af3e83970c60d4a8a2cc6ea7b997ecc2f4de0a47eeacbb88dde46 ls
[
    {
        "style": "cephadm:v1",
...
    }
]


Also the command:

ceph orch upgrade start -ceph_version v18.2.0

That looks like a bug to me, it's reproducable:

$ ceph orch upgrade check --ceph-version 18.2.0
Error EINVAL: host ceph01 `cephadm pull` failed: cephadm exited with an error code: 1, stderr: Pulling container image quay.io/ceph/ceph:v18:v18.2.0... Non-zero exit code 125 from /usr/bin/podman pull quay.io/ceph/ceph:v18:v18.2.0 --authfile=/etc/ceph/podman-auth.json
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull quay.io/ceph/ceph:v18:v18.2.0 --authfile=/etc/ceph/podman-auth.json

It works correctly with 17.2.6:

# ceph orch upgrade check --ceph-version 18.2.0
{
    "needs_update": {
        "crash.soc9-ceph": {
"current_id": "2d45278716053f92517e447bc1a7b64945cc4ecbaff4fe57aa0f21632a0b9930", "current_name": "quay.io/ceph/ceph@sha256:1e442b0018e6dc7445c3afa7c307bc61a06189ebd90580a1bb8b3d0866c0d8ae",
            "current_version": "17.2.6"
...

I haven't checked for existing tracker issues yet. I'd recommend to check and create a bug report:

https://tracker.ceph.com/

Regards,
Eugen

Zitat von Martin Conway <martin.conway@xxxxxxxxxx>:

Hi

I have been using Ceph for many years now, and recently upgraded to Reef.

Seems I made the jump too quickly, as I have been hitting a few issues. I can't find any mention of them in the bug reports. I thought I would share them here in case it is something to do with my setup.

On V18.2.0

cephadm version

Fails with the following output:

Traceback (most recent call last):
 File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
   "__main__", mod_spec)
 File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
   exec(code, run_globals)
 File "/usr/sbin/cephadm/__main__.py", line 10096, in <module>
 File "/usr/sbin/cephadm/__main__.py", line 10084, in main
 File "/usr/sbin/cephadm/__main__.py", line 2240, in _infer_image
 File "/usr/sbin/cephadm/__main__.py", line 2338, in infer_local_ceph_image
 File "/usr/sbin/cephadm/__main__.py", line 2301, in get_container_info
 File "/usr/sbin/cephadm/__main__.py", line 2301, in <listcomp>
 File "/usr/sbin/cephadm/__main__.py", line 222, in __getattr__
AttributeError: 'CephadmContext' object has no attribute 'fsid'

I don't know if it is related, but

cephadm adopt --style legacy --name osd.X

Tries to use a V15 image which then fails to start after being imported. The OSD in question has an SSD device from block.db if that is relevant.

Using the latest head version of cephadm from github let me work around this issue, but the adopted OSDs were running 18.0.0-6603-g6c4ed58a and needed to be upgraded to 18.2.0.

Also the command:

ceph orch upgrade start -ceph_version v18.2.0

Does not work, it fails to find the right image. From memory I think it tried to pull quay.io/ceph/ceph:v18:v18.2.0

Ceph orch upgrade start quay.io/ceph/ceph:v18.2.0

Does work as expected.

Let me know if there is any other information that would be helpful, but I have since worked around these issues and have my ceph back in a happy state.

Regards,
Martin Conway
IT and Digital Media Manager
Research School of Physics
Australian National University
Canberra ACT 2601

+61 2 6125 1599
https://physics.anu.edu.au<https://physics.anu.edu.au/>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux