On Fri, Apr 15, 2022 at 3:10 AM David Galloway <dgallowa@xxxxxxxxxx> wrote: > > For transparency and posterity's sake... > > I tried upgrading the LRC and the first two mgrs upgraded fine but > reesi004 threw an error. > > Apr 14 22:54:36 reesi004 podman[2042265]: 2022-04-14 22:54:36.210874346 > +0000 UTC m=+0.138897862 container create > 3991bea0a86f55679f9892b3fbceeef558dd1edad94eb4bf73deebf6595bcc99 > (image=quay.ceph.io/ceph-ci/ceph@sha256:230120c6a429af7546b91180a3da39846e760787580d7b5193487 > Apr 14 22:54:36 reesi004 bash[2042070]: Error: OCI runtime error: > writing file `pids.max`: Invalid argument > > Adam and I suspected we needed > https://github.com/ceph/ceph/pull/45853#issue-1200032778 so I took the > tip of quincy, cherry-picked that PR and pushed to dgalloway-quincy-fix > in ceph-ci.git. Then I waited for packages and a container to get built > and attempted to upgrade the LRC to that container version. > > Same error though. So I'm leaving it for the weekend. We have two MGRs > that *did* upgrade to the tip of quincy but the rest of the containers > are still running 17.1.0-5-g8299cd4c. I don't think https://github.com/ceph/ceph/pull/45853 would help. The problem appears to be that --pids-limit=-1 just doesn't work on older podman versions. "-1" is not massaged there and is attempted to be written to /sys/fs/cgroup/pids/.../pids.max, which fails because pids.max file expects either a non-negative integer or "max" [1]. I don't understand how some of the other manager daemons upgraded though, since the LRC nodes appear to be running Ubuntu 18.04 LTS with an older podman: $ podman --version podman version 3.0.1 This was reported in [2] and addressed in podman in [3], fairly recently. Their fix was to make "-1" be treated the same as "0", as older podman versions insisted on "0" for unlimited and "-1" either never worked or stopped working a long time ago. docker seems to accept both "-1" and "0" for unlimited. The best of course of action would probably be to drop [4] from quincy, getting it back to 17.1.0 state (i.e. no --pids-limit option in sight) and amend the original --pids-limit change in master so that it works for all versions of podman. The podman version is already checked in a couple of places (e.g. CGROUPS_SPLIT_PODMAN_VERSION) so it should be easy enough or we could just unconditionally pass "0" even though it is not documented anymore. (The reason for backporting [4] to quincy was to fix containerized iSCSI deployments where bumping into default PID limit is just a matter of scaling the number of exported LUNs. It's been that way since the initial pacific release though so taking it out for now is completely acceptable.) [1] https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt [2] https://github.com/containers/podman/issues/11782 [3] https://github.com/containers/podman/pull/11794 [4] https://github.com/ceph/ceph/pull/45576 Thanks, Ilya _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx