On Tue, 2012-03-27 at 22:04 +0100, David McBride wrote: > On Tue, 2012-03-27 at 11:06 -0700, Sage Weil wrote: > > > This shouldn't change the PG count either. If you do > > > > ceph osd dump | grep ^pool > > > > you'll see a pg_num value for each pool that should remain constant. > > Only the size should change (replica count)... > > Okay, that's what I was expecting. I earlier rebuilt the cluster and > repeated my earlier results; however, I don't have the output of those > commands to hand. Hi, Results are in. Something odd is going on; the results returned by `ceph -s` and `ceph osd dump` are inconsistent: * `ceph osd dump` does indeed indicate that the pg_num values are remaining constant for each pool before and after changing the replica count. * However, the total number of PGs being reported by `ceph -s` or `ceph -w` increases immediately after issuing the replica count change command for a pool. The increase in size is equal to the number of live OSDs; in this case, 28. * This apparent (silent) increase in PG count will occur three times if the change is applied to all three pools, `data`, `metadata`, and `rbd`. * Changing the replica count up and down again after the initial increase has no effect on the reported replica count. * My steps for reproducing are: - Mint a new cluster, with 14 OSDs stored on server A. - Start the cluster. - Add some data to the `rbd` pool using `rados bench`. - Initialize 14 additional OSDs on server B. - Add the server B OSDs to the cluster. - Increase the replica count. This process is probably not minimal. I can try to run some experiments to see what factors are significant. (I'm pretty sure I could skip the `rados bench` step, for example.) * In case it makes a difference, I'm using XFS, not BTRFS, for the OSD's backing-store. Here's the output of ceph status commands during the various stages: Prior to OSD addition: ====================== output from: `ceph -s`: > 2012-03-28 12:30:25.086686 pg v133: 2772 pgs: 2772 active+clean; 13252 MB data, 55720 MB used, 1857 GB / 1911 GB avail > 2012-03-28 12:30:25.101300 mds e1: 0/0/1 up > 2012-03-28 12:30:25.101424 osd e11: 14 osds: 14 up, 14 in > 2012-03-28 12:30:25.101689 log 2012-03-28 12:25:22.596734 mon.0 146.169.21.55:6789/0 16 : [INF] osd.8 146.169.1.13:6836/6339 boot > 2012-03-28 12:30:25.101897 mon e1: 1 mons at {vm-cephhead=146.169.21.55:6789/0} output from: `ceph osd dump | grep pg_num`: > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 Adding the second set of OSDs: ============================= output from: `ceph osd dump | grep pg_num`: > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 Changing `rbd` pool replica count: ================================= output from: `ceph -w`: > 2012-03-28 12:36:36.768714 pg v313: 2772 pgs: 2772 active+clean; 13252 MB data, 86002 MB used, 3739 GB / 3823 GB avail > 2012-03-28 12:36:37.763292 pg v314: 2772 pgs: 2772 active+clean; 13252 MB data, 85163 MB used, 3740 GB / 3823 GB avail (issued: `ceph osd pool set rbd size 3`) > 2012-03-28 12:36:42.308575 pg v315: 2800 pgs: 28 creating, 2772 active+clean; 13252 MB data, 85163 MB used, 3740 GB / 3823 GB avail > 2012-03-28 12:36:42.314124 osd e105: 28 osds: 28 up, 28 in > 2012-03-28 12:36:43.399792 pg v316: 2800 pgs: 28 creating, 2772 active+clean; 13252 MB data, 85163 MB used, 3740 GB / 3823 GB avail > 2012-03-28 12:36:43.402742 osd e106: 28 osds: 28 up, 28 in > 2012-03-28 12:36:46.691598 pg v317: 2800 pgs: 28 creating, 2737 active+clean, 35 active+recovering; 13252 MB data, 84818 MB used, 3740 GB / 3823 GB avail; 274/6765 degraded (4.050%) > 2012-03-28 12:36:47.596507 pg v318: 2800 pgs: 28 creating, 2709 active+clean, 63 active+recovering; 13252 MB data, 84819 MB used, 3740 GB / 3823 GB avail; 524/6890 degraded (7.605%) output from: `ceph osd dump | grep pg_num`: > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 > pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 105 owner 0 Changing `data` pool replica count: ================================== output from: `ceph -w`: > 2012-03-28 13:10:54.818447 pg v573: 2800 pgs: 14 creating, 2786 active+clean; 13252 MB data, 98329 MB used, 3727 GB / 3823 GB avail (issued: `ceph osd pool set data size 3`) > 2012-03-28 13:11:08.240605 pg v574: 2828 pgs: 42 creating, 2786 active+clean; 13252 MB data, 98329 MB used, 3727 GB / 3823 GB avail > 2012-03-28 13:11:08.245026 osd e114: 28 osds: 28 up, 28 in > 2012-03-28 13:11:09.050371 pg v575: 2828 pgs: 42 creating, 2786 active+clean; 13252 MB data, 98329 MB used, 3727 GB / 3823 GB avail > 2012-03-28 13:11:09.051179 osd e115: 28 osds: 28 up, 28 in output from: `ceph osd dump | grep pg_num`: > pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 114 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 1 owner 0 > pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 112 owner 0 Increase metadata pool ====================== output from `ceph -w`: > 2012-03-28 13:12:04.279557 pg v580: 2828 pgs: 28 creating, 2800 active+clean; 13252 MB data, 98338 MB used, 3727 GB / 3823 GB avail (issued: `ceph osd pool set metadata size 3`) > 2012-03-28 13:13:19.748554 pg v581: 2856 pgs: 56 creating, 2800 active+clean; 13252 MB data, 98338 MB used, 3727 GB / 3823 GB avail > 2012-03-28 13:13:19.753181 osd e116: 28 osds: 28 up, 28 in > 2012-03-28 13:13:20.840151 pg v582: 2856 pgs: 56 creating, 2800 active+clean; 13252 MB data, 98338 MB used, 3727 GB / 3823 GB avail > 2012-03-28 13:13:20.842065 osd e117: 28 osds: 28 up, 28 in output from: `ceph osd dump | grep pg_num`: > pool 0 'data' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 114 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 3 crush_ruleset 1 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 116 owner 0 > pool 2 'rbd' rep size 3 crush_ruleset 2 object_hash rjenkins pg_num 896 pgp_num 896 lpg_num 2 lpgp_num 2 last_change 112 owner 0 This sounds like it's probably defect. Should I mint a new bug ticket in the tracker? Cheers, David -- David McBride <dwm@xxxxxxxxxxxx> Department of Computing, Imperial College, London -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html