Re: Upgrade paths beyond octopus on Centos7

Frank Schilder <frans@xxxxxx> · Sun, 14 Aug 2022 11:11:49 +0000

Hi Brent,

what I meant was to run outside of ceph-adm. Its not about deploying a monitor, its about checking that the monitor can be started at all. The complete ceph-mon command line is in the error message:

/usr/bin/ceph-mon --mkfs -i
tpixmon5 --fsid 33ca8009-79d6-45cf-a67e-9753ab4dc861 -c /tmp/config
--keyring /tmp/keyring --setuser ceph --setgroup ceph
--default-log-to-file=false --default-log-to-journald=true
--default-log-to-stderr=false --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-journald=true
--default-mon-cluster-log-to-stderr=false

You should just manually start a ceph container on the node with entrypoint command /bin/sleep infinity and then enter this container by attaching an interactive bash session. Then you can try out anything inside the container. You will need to populate the /tmp/xyz files, but apart from that, the above command should run. It will build a mon store inside the container and you will see its log messages right away. Note that the above mon-command is --mkfs, that is, this command will not start an actual mon that will try joining quorum. A mon is started in a subsequent step.

The error log contains the complete podman command line. It has a couple of -v a:b bind-mounts that are tied to a uuid. You can populate these with yourself as well if you want to go deeper with manual debugging.

You should also check if the right images are actually on the host and it didm't try to run the mon in an older version.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Brent Kennedy <bkennedy@xxxxxxxxxx>
Sent: 14 August 2022 00:02:20
To: ceph-users@xxxxxxx
Subject:  Re: Upgrade paths beyond octopus on Centos7

I didn’t try running the container manually because I don’t know the proper
command line for that.  If you are talking about "cephadm --image <full
image path> deploy --fsid <cluster-fsid> --name mon.tpixmon5" it wouldn’t
run the monitor.  I could however run the mgr that way.

I guess I could just edit the unit.run file in
/var/lib/ceph/fsid/mon.tpixmon5 to remove those two items...   didn’t think
about that( been staring at this too long ).  If that’s also the string to
run containers...  that that helps a lot.  I assumed those items were there
for a reason though.

-Brent

-----Original Message-----
From: Frank Schilder <frans@xxxxxx>
Sent: Friday, August 12, 2022 8:30 AM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>; ceph-users@xxxxxxx
Subject: Re:  Re: Upgrade paths beyond octopus on Centos7

Isn't the problem stated in the error message:

2022-06-25 21:51:59,703 7f4748727b80 DEBUG /usr/bin/ceph-mon: too many
arguments:
[--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]

Your ceph-mon seems not to understand these 2 command line arguments. Is
there a typo or something? Did you try running an interactive container on
bash and start the mon manually?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Brent Kennedy <bkennedy@xxxxxxxxxx>
Sent: 12 August 2022 00:55:27
To: ceph-users@xxxxxxx
Subject:  Re: Upgrade paths beyond octopus on Centos7

I wiped all the cephadm stuff off the nodes when I flipped them back to
direct installations then rebooted as part of that process, so all the
system logs were rotated and the messages are lost.

Here is the one I posted earlier from the Rocky computer trying to run
Octopus monitor container ( it could run the mgr container though, which was
odd to me ):

2022-06-25 21:51:34,427 7f4748727b80 DEBUG stat: Copying blob
sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621
2022-06-25 21:51:34,647 7f4748727b80 DEBUG stat: Copying blob
sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621
2022-06-25 21:51:34,652 7f4748727b80 DEBUG stat: Copying blob
sha256:731c3beff4deece7d4e54bc26ecf6d99988b19ea8414524277d83bc5a5d6eb70
2022-06-25 21:51:59,006 7f4748727b80 DEBUG stat: Copying config
sha256:2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73
2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Writing manifest to image
destination
2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Storing signatures
2022-06-25 21:51:59,239 7f4748727b80 DEBUG stat: 167 167
2022-06-25 21:51:59,703 7f4748727b80 DEBUG /usr/bin/ceph-mon: too many
arguments:
[--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]
2022-06-25 21:51:59,797 7f4748727b80 INFO Non-zero exit code 1 from
/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host
--entrypoint /usr/bin/ceph-mon --init -e
CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=tpixmon5 -e
CEPH_USE_RANDOM_NONCE=1 -v
/var/log/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861:/var/log/ceph:z -v
/var/lib/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861/mon.tpixmon5:/var/lib/cep
h/mon/ceph-tpixmon5:z -v /tmp/ceph-tmp7xmra8lk:/tmp/keyring:z -v
/tmp/ceph-tmp7mid2k57:/tmp/config:z docker.io/ceph/ceph:v15 --mkfs -i
tpixmon5 --fsid 33ca8009-79d6-45cf-a67e-9753ab4dc861 -c /tmp/config
--keyring /tmp/keyring --setuser ceph --setgroup ceph
--default-log-to-file=false --default-log-to-journald=true
--default-log-to-stderr=false --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-journald=true
--default-mon-cluster-log-to-stderr=false
2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too many
arguments:
[--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]

I didn't capture the overlay one from the centos 7 machine but the only
message had to do with the overlay.  My search history showed this:
"overlayfs: unrecognized mount option "volatile" or missing value".

I also tried to load podman 3.1 which I cobbled together from around the
internet and it didn't work with the Octopus containers :(

-Brent

-----Original Message-----
From: Nico Schottelius <nico.schottelius@xxxxxxxxxxx>
Sent: Monday, August 8, 2022 3:09 AM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>
Cc: ceph-users@xxxxxxx
Subject: Re:  Re: Upgrade paths beyond octopus on Centos7

Hey Brent,

thanks a lot for following up on this. Would it be possible to send the
error messsages that you get in both cases?

While I do have my reservations about cephadm (based on experience with
ceph-deploy, ceph-ansible and friends), I would like to drill down to the
core of the problem, as containers *should* indeed be running on "any" CRI.
If they don't, I would expect the usage of parameters that are unknown to
either podman version, however being in the container specification and
unrelated to the actual image.

Do you mind posting the cephadm and podman versions and the corresponding
error messages that you have received with Octopus / Quincy?

Best regards,

Nico

"Brent Kennedy" <bkennedy@xxxxxxxxxx> writes:

> All I can say is that its been impossible this past month to upgrade
> past octopus using cephadm on centos 7.  I thought if I spun up new
> servers and started containers on those using the Octopus cephadm
> script,
I would be ok.
> But both Rocky and Centos 8 stream wont run the older Octopus containers.
> When the containers start on podman 4, they show an error regarding
groups.
> Searching on the web for that error only returns posts saying you can
> ignore it, but the container/service wont start.  I thought upgrading
> to quincy would solve this, but then the quincy containers wont run on
> centos 7, they throw an overlay error.  Which is how I ended up with
> cluster that was limping along with one monitor and 132 OSDs.  Just
> today, I went back and manually installed ceph octopus on all the
> nodes(bare metal install) and that got me back to working again.
> Based on another post, it seems the best way to proceed from here is
> to upgrade the remaining centos 7 servers to centos stream 8 or
> wipe/install rocky and load octopus bare metal.  Then once that is
> done, upgrade to quincy as bare metal.  Final step would be then
> moving to containers(cephadm).  Unfortunately, I had already adopted
> all the OSD containers, so hopefully I can swap them back to bare
> metal
without too much hassle.
>
> This podman issue basically shows the flaw in the thinking that
> containers solve the OS issue( I ran into this with Docker and
> mesosphere, so I kinda knew what I was in for ).  As much as I
> appreciate the Dev team here at ceph and like container methodology,
> the way this went down is a shame ( unless I am missing something ).
> I only held back upgrading because of the lack of upgrade path and
> then the centos stream situation, we normally upgrade things within 6
> months of release.  BTW, I tried to upgrade centos 7 to stream 8 and
> it said all the ceph modules conflicted with upgrade components, thus
> I had to remove them, hence why I am starting fresh with each machine(
> its
also quicker with VM images, at least for the VMs ).
>
> The upgrade path discussion I am referring to is titled:  "Migration
> from CentOS7/Nautilus to CentOS Stream/Pacific"
>
> -Brent
>
> -----Original Message-----
> From: Marc <Marc@xxxxxxxxxxxxxxxxx>
> Sent: Sunday, August 7, 2022 5:25 AM
> To: Nico Schottelius <nico.schottelius@xxxxxxxxxxx>;
> ceph-users@xxxxxxx
> Subject:  Re: Upgrade paths beyond octopus on Centos7
>
>
>> Reading your mails I am double puzzled, as I thought that cephadm
>> would actually solve these kind of issues in the first place and I
>> would
>
> It is being advocated like this. My opinion is, that it is primarily
> being used as a click next next install tool so a broader audience can
> be
reached.
> If the focus is on this, problems such as the one below are imminent.
>
>> expect it to be be especially stable on RH/Centos.
>
> I thought I would give CentOS 9 stream a chance upgrading the office
server.
> Converting applications to containers, so I am less dependant in the
> future on the os. On the 10th day or so some container crashed,
> crashed the whole server, and then strangely enough all containers
> would not start because of a little damage in one container layer (not
> shared with others) of the new container.
> Unfortunately mounted all on the root fs, so I had to do fsck of the
> root fs.
>
> Afaik is podman also a fork of the docker code, and fwiiw there have
> developers that coded there own containerizer because they thought the
> docker implementation was not stable.
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> email to ceph-users-leave@xxxxxxx

--
Sustainable and modern Infrastructures by ungleich.ch

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx