Hi Robert, this sounds a bit worse than I thought. I remember that RedHat stopped packaging Ubuntu containers. The default for the public containers is now Centos, which I'm running on all my hosts (I also use ceph containers). If Ubuntu indeed reserves a different default UID for ceph and UID 167 is available, then you should probably create/change the ceph user with GID=UID=167 on all hosts before installing anything ceph related. Otherwise there might be more to re-own than just /var/lib/ceph (e.g /var/log/ceph). The ansible tasks I included earlier will do this user creation. Everything should work fine after that. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Robert LeBlanc <robert@xxxxxxxxxxxxx> Sent: 29 August 2019 16:41 To: Frank Schilder Cc: ceph-users Subject: Re: Failure to start ceph-mon in docker Frank, Thank you for the explanation, these are freshly installed machines and did not have ceph on them. I checked one of the other OSD nodes and there is no ceph user in /etc/passwd, nor is UID 167 allocated to any user. I did install ceph-common from the 18.04 repos before realizing that deploying ceph in containers did not update the host's /etc/apt/sources.list (or add an entry in /etc/apt/sources.list.d/). I manually added the repo for nautilus and upgraded the packages. So, I don't know if that had anything to do with it. Maybe Ubuntu packages ceph under UID 64045 and upgrading to the Ceph distributed packages didn't change the UID. Thanks, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Aug 29, 2019 at 12:33 AM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote: Hi Robert, this is a bit less trivial than it might look right now. The ceph user is usually created by installing the package ceph-common. By default it will use id 167. If the ceph user already exists, I would assume it will use the existing user to allow an operator to avoid UID collisions (if 167 is used already). If you use docker, the ceph UID on the host and inside the container should match (or need to be translated). If they don't, you will have a lot of fun re-owning stuff all the time, because deployments will use the symbolic name ceph, which has different UIDs on the host and inside the container in your case. I would recommend removing this discrepancy as soon as possible: 1) Find out why there was a ceph user with UID different from 167 before installation of ceph-common. Did you create it by hand? Was UID 167 allocated already? 2) If you can safely change the GID and UID of ceph to 167, just do groupmod+usermod with new GID and UID. 3) If 167 is used already by another service, you will have to map the UIDs between host and container. To prevent ansible from deploying dockerized ceph with mismatching user ID for ceph, add these tasks to an appropriate part of your deployment (general host preparation or so): - name: "Create group 'ceph'." group: name: ceph gid: 167 local: yes state: present system: yes - name: "Create user 'ceph'." user: name: ceph password: "!" comment: "ceph-container daemons" uid: 167 group: ceph shell: "/sbin/nologin" home: "/var/lib/ceph" create_home: no local: yes state: present system: yes This should err if a group and user ceph already exist with IDs different from 167. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx<mailto:ceph-users-bounces@xxxxxxxxxxxxxx>> on behalf of Robert LeBlanc <robert@xxxxxxxxxxxxx<mailto:robert@xxxxxxxxxxxxx>> Sent: 28 August 2019 23:23:06 To: ceph-users Subject: Re: Failure to start ceph-mon in docker Turns out /var/lib/ceph was ceph.ceph and not 167.167, chowning it made things work. I guess only monitor needs that permission, rgw,mgr,osd are all happy without needing it to be 167.167. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Aug 28, 2019 at 1:45 PM Robert LeBlanc <robert@xxxxxxxxxxxxx<mailto:robert@xxxxxxxxxxxxx><mailto:robert@xxxxxxxxxxxxx<mailto:robert@xxxxxxxxxxxxx>>> wrote: We are trying to set up a new Nautilus cluster using ceph-ansible with containers. We got things deployed, but I couldn't run `ceph s` on the host so decided to `apt install ceph-common and installed the Luminous version from Ubuntu 18.04. For some reason the docker container that was running the monitor restarted and won't restart. I added the repo for Nautilus and upgraded ceph-common, but the problem persists. The Manager and OSD docker containers don't seem to be affected at all. I see this in the journal: Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Starting Ceph Monitor... Aug 28 20:40:55 sun-gcs02-osd01 docker[2926]: Error: No such container: ceph-mon-sun-gcs02-osd01 Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Started Ceph Monitor. Aug 28 20:40:55 sun-gcs02-osd01 docker[2949]: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:40:56 /opt/ceph-container/bin/entrypoint.sh: Existing mon, trying to rejoin cluster... Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: warning: line 41: 'osd_memory_target' in section 'osd' redefined Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03 /opt/ceph-container/bin/entrypoint.sh: /etc/ceph/ceph.conf is already memory tuned Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03 /opt/ceph-container/bin/entrypoint.sh: SUCCESS Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: PID 368: spawning /usr/bin/ceph-mon --cluster ceph --default-log-to-file=false --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -d --mon-cluster-log-to-stderr --log-stderr-prefix=debug -i sun-gcs02-osd01 --mon-data /var/lib/ceph/mon/ceph-sun-gcs02-osd01 --public-addr 10.65.101.21 Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: Waiting 368 to quit Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: warning: line 41: 'osd_memory_target' in section 'osd' redefined Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 7f401283c180 0 set uid:gid to 167:167 (ceph:ceph) Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 7f401283c180 0 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process ceph-mon, pid 368 Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 7f401283c180 -1 stat(/var/lib/ceph/mon/ceph-sun-gcs02-osd01) (13) Permission denied Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 7f401283c180 -1 error accessing monitor data directory at '/var/lib/ceph/mon/ceph-sun-gcs02-osd01': (13) Permission denied Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: managing teardown after SIGCHLD Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Waiting PID 368 to terminate Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Process 368 is terminated Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Bye Bye, container will die with return code -1 Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: if you don't want me to die and have access to a shell to debug this situation, next time run me with '-e DEBUG=stayalive' Aug 28 20:41:04 sun-gcs02-osd01 systemd[1]: ceph-mon@sun-gcs02-osd01.service: Main process exited, code=exited, status=255/n/a Aug 28 20:41:04 sun-gcs02-osd01 systemd[1]: ceph-mon@sun-gcs02-osd01.service: Failed with result 'exit-code'. The directories for the monitor are owned by 167.167 and matches the UID.GID that the container reports. oot@sun-gcs02-osd01:~# ls -lhd /var/lib/ceph/ drwxr-x--- 14 ceph ceph 4.0K Jul 30 22:15 /var/lib/ceph/ root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/ total 56K drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-mds drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-mgr drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-osd drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-rbd drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-rbd-mirror drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-rgw drwxr-xr-x 3 167 167 4.0K Jul 30 22:15 mds drwxr-xr-x 3 167 167 4.0K Jul 30 22:15 mgr drwxr-xr-x 3 167 167 4.0K Jul 30 22:15 mon drwxr-xr-x 14 167 167 4.0K Jul 30 22:28 osd drwxr-xr-x 4 167 167 4.0K Aug 1 23:36 radosgw drwxr-xr-x 254 167 167 12K Aug 28 20:44 tmp root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/ total 4.0K drwxr-xr-x 3 167 167 4.0K Jul 30 22:16 ceph-sun-gcs02-osd01 root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/ceph-sun-gcs02-osd01/ total 16K -rw------- 1 167 167 77 Jul 30 22:15 keyring -rw-r--r-- 1 167 167 8 Jul 30 22:15 kv_backend -rw-r--r-- 1 167 167 3 Jul 30 22:16 min_mon_release drwxr-xr-x 2 167 167 4.0K Aug 28 19:16 store.db root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/ceph-sun-gcs02-osd01/store.db/ total 149M -rw-r--r-- 1 167 167 1.7M Aug 28 19:16 050225.log -rw-r--r-- 1 167 167 65M Aug 28 19:16 050227.sst -rw-r--r-- 1 167 167 45M Aug 28 19:16 050228.sst -rw-r--r-- 1 167 167 16 Aug 16 07:40 CURRENT -rw-r--r-- 1 167 167 37 Jul 30 22:15 IDENTITY -rw-r--r-- 1 167 167 0 Jul 30 22:15 LOCK -rw-r--r-- 1 167 167 1.3M Aug 28 19:16 MANIFEST-027846 -rw-r--r-- 1 167 167 4.7K Aug 1 23:38 OPTIONS-002825 -rw-r--r-- 1 167 167 4.7K Aug 16 07:40 OPTIONS-027849 ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx