Re: [ceph-users] Re: quincy v17.2.0 QE Validation status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 15, 2022 at 3:10 AM David Galloway <dgallowa@xxxxxxxxxx> wrote:
>
> For transparency and posterity's sake...
>
> I tried upgrading the LRC and the first two mgrs upgraded fine but
> reesi004 threw an error.
>
> Apr 14 22:54:36 reesi004 podman[2042265]: 2022-04-14 22:54:36.210874346
> +0000 UTC m=+0.138897862 container create
> 3991bea0a86f55679f9892b3fbceeef558dd1edad94eb4bf73deebf6595bcc99
> (image=quay.ceph.io/ceph-ci/ceph@sha256:230120c6a429af7546b91180a3da39846e760787580d7b5193487
> Apr 14 22:54:36 reesi004 bash[2042070]: Error: OCI runtime error:
> writing file `pids.max`: Invalid argument
>
> Adam and I suspected we needed
> https://github.com/ceph/ceph/pull/45853#issue-1200032778 so I took the
> tip of quincy, cherry-picked that PR and pushed to dgalloway-quincy-fix
> in ceph-ci.git.  Then I waited for packages and a container to get built
> and attempted to upgrade the LRC to that container version.
>
> Same error though.  So I'm leaving it for the weekend.  We have two MGRs
> that *did* upgrade to the tip of quincy but the rest of the containers
> are still running 17.1.0-5-g8299cd4c.

I don't think https://github.com/ceph/ceph/pull/45853 would help.
The problem appears to be that --pids-limit=-1 just doesn't work on
older podman versions.  "-1" is not massaged there and is attempted to
be written to /sys/fs/cgroup/pids/.../pids.max, which fails because
pids.max file expects either a non-negative integer or "max" [1].
I don't understand how some of the other manager daemons upgraded
though, since the LRC nodes appear to be running Ubuntu 18.04 LTS with
an older podman:

    $ podman --version
    podman version 3.0.1

This was reported in [2] and addressed in podman in [3], fairly
recently.  Their fix was to make "-1" be treated the same as "0", as
older podman versions insisted on "0" for unlimited and "-1" either
never worked or stopped working a long time ago.  docker seems to
accept both "-1" and "0" for unlimited.

The best of course of action would probably be to drop [4] from quincy,
getting it back to 17.1.0 state (i.e. no --pids-limit option in sight)
and amend the original --pids-limit change in master so that it works
for all versions of podman.  The podman version is already checked in
a couple of places (e.g. CGROUPS_SPLIT_PODMAN_VERSION) so it should be
easy enough or we could just unconditionally pass "0" even though it
is not documented anymore.

(The reason for backporting [4] to quincy was to fix containerized
iSCSI deployments where bumping into default PID limit is just a matter
of scaling the number of exported LUNs.  It's been that way since the
initial pacific release though so taking it out for now is completely
acceptable.)

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt
[2] https://github.com/containers/podman/issues/11782
[3] https://github.com/containers/podman/pull/11794
[4] https://github.com/ceph/ceph/pull/45576

Thanks,

            Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux