fsid changed?

Mike Carlson <mike@xxxxxxxxxxxx> · Thu, 21 Jan 2016 10:06:42 -0800

Hey ceph-users,
One of of ceph environments changed its fsid for the cluster, and I would like advice on how to get it corrected.

We added a new OSD node in hope of retiring one of the older OSD + MON nodes.

Using ceph-deploy, we unfortunately ran "ceph-deploy mon create ..." instead of mon add

The ceph.log file reported:

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /usr/bin/ceph-deploy mon create ceph-osd7
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-osd7
[ceph_deploy.mon][DEBUG ] detecting platform for host ceph-osd7 ...
[ceph-osd7][DEBUG ] connection detected need for sudo
[ceph-osd7][DEBUG ] connected to host: ceph-osd7 
[ceph-osd7][DEBUG ] detect platform information from remote host
[ceph-osd7][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: Ubuntu 14.04 trusty
[ceph-osd7][DEBUG ] determining if provided host has same hostname in remote
[ceph-osd7][DEBUG ] get remote short hostname
[ceph-osd7][DEBUG ] deploying mon to ceph-osd7
[ceph-osd7][DEBUG ] get remote short hostname
[ceph-osd7][DEBUG ] remote hostname: ceph-osd7
[ceph-osd7][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-osd7][DEBUG ] create the mon path if it does not exist
[ceph-osd7][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph-osd7/done
[ceph-osd7][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-ceph-osd7/done
[ceph-osd7][INFO  ] creating keyring file: /var/lib/ceph/tmp/ceph-ceph-osd7.mon.keyring
[ceph-osd7][DEBUG ] create the monitor keyring file
[ceph-osd7][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs -i ceph-osd7 --keyring /var/lib/ceph/tmp/ceph-ceph-osd7.mon.keyring
[ceph-osd7][DEBUG ] ceph-mon: set fsid to 6870cff2-6cbc-4e99-8615-c159ba3a0546

So, it looks like the fsid was changed from e238f5b3-7d67-4b55-8563-52008828db51 to  6870cff2-6cbc-4e99-8615-c159ba3a0546

ceph -s shows the previous fsid:
ceph -s
    cluster e238f5b3-7d67-4b55-8563-52008828db51
     health HEALTH_WARN
            too few PGs per OSD (29 < min 30)
            1 mons down, quorum 1,2 ceph-mon,ceph-osd3
     monmap e9: 3 mons at {ceph-mon=10.5.68.69:6789/0,ceph-osd3=10.5.68.92:6789/0,ceph-osd6=10.5.68.35:6789/0}
            election epoch 416, quorum 1,2 ceph-mon,ceph-osd3
     mdsmap e640: 1/1/1 up {0=ceph-mon=up:active}, 1 up:standby
     osdmap e6275: 58 osds: 58 up, 58 in
      pgmap v18874768: 848 pgs, 16 pools, 4197 GB data, 911 kobjects
            8691 GB used, 22190 GB / 30881 GB avail
                 848 active+clean
  client io 0 B/s rd, 345 kB/s wr, 58 op/s

What seems odd, is my ceph.conf never had e238f5b3-7d67-4b55-8563-52008828db51 as the fsid. I even pulled from backups, and  it has always been:

root@ceph-mon:~/RESTORE/2016-01-02/etc/ceph# cat ceph.conf 
[global]
fsid = 6870cff2-6cbc-4e99-8615-c159ba3a0546
mon_initial_members = ceph-mon
mon_host = 10.5.68.69,10.5.68.65,10.5.68.92
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2
public_network = 10.5.68.0/24
cluster_network = 10.7.1.0/24

The cluster seems to be "up", but I'm concerned that I only have 2 monitors, I cannot add a third since authentication to the cluster fails:

2016-01-20 16:41:09.544870 7f1f238ed8c0  0 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3), process ceph-mon, pid 32010
2016-01-20 16:41:09.765043 7f1f238ed8c0  0 mon.ceph-osd6 does not exist in monmap, will attempt to join an existing cluster
2016-01-20 16:41:09.773435 7f1f238ed8c0  0 using public_addr 10.5.68.35:0/0 -> 10.5.68.35:6789/0
2016-01-20 16:41:09.773517 7f1f238ed8c0  0 starting mon.ceph-osd6 rank -1 at 10.5.68.35:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-osd6 fsid 6870cff2-6cbc-4e99-8615-c159ba3a0546
2016-01-20 16:41:09.774549 7f1f238ed8c0  1 mon.ceph-osd6@-1(probing) e0 preinit fsid 6870cff2-6cbc-4e99-8615-
c159ba3a0546
2016-01-20 16:41:10.746413 7f1f1ec65700  0 log_channel(audit) log [DBG] : from='admin socket' entity='admin socket' cmd='mon_status' args=[]: dispatch

And yes, we need to increase our PG count. This cluster has grown from a few 2TB drives to multiple 600GB sas drives, but I don't want to touch anything else until I can get this figured out.

This is running as our Openstack VM storage, so it is not something we can simply rebuild.

Thanks,
Mike C
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com