Le 14/10/2014 18:51, Lionel Bouton a
écrit :
In fact ~200 instead of ~600 on the big nodes in the current configuration. So it's in the same ballpark than your estimation.
In the logs the big OSD is osd.0. The small one is osd.2. I'll call them BIG and SMALL in the following. Results are interesting. First there is ~8 sec on BIG to select the Btrfs snapshot: 2014-10-14 19:04:56.290936 7f8b84516780 10 filestore(/var/lib/ceph/osd/ceph-0) most recent snap from <23107688,23107701> is 23107701 2014-10-14 19:04:56.290949 7f8b84516780 10 filestore(/var/lib/ceph/osd/ceph-0) mount rolling back to consistent snap 23107701 2014-10-14 19:04:56.290955 7f8b84516780 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-0) rollback_to: to 'snap_23107701' 2014-10-14 19:05:04.379241 7f8b84516780 5 filestore(/var/lib/ceph/osd/ceph-0) mount op_seq is 23107701 This takes less than 1sec on SMALL : 2014-10-14 19:02:19.596158 7fd0a0d2b780 10 filestore(/var/lib/ceph/osd/ceph-2) most recent snap from <1602633,1602645> is 1602645 2014-10-14 19:02:19.596176 7fd0a0d2b780 10 filestore(/var/lib/ceph/osd/ceph-2) mount rolling back to consistent snap 1602645 2014-10-14 19:02:19.596182 7fd0a0d2b780 10 btrfsfilestorebackend(/var/lib/ceph/osd/ceph-2) rollback_to: to 'snap_1602645' 2014-10-14 19:02:20.311178 7fd0a0d2b780 5 filestore(/var/lib/ceph/osd/ceph-2) mount op_seq is 1602645 I assume this is the time Btrfs itself takes mounting the snapshot. This behavior (slow Btrfs snapshot operations) seems to repeat itself : the first checkpoint creation (which seems to involve a Btrfs snapshot) takes more than 15s on BIG 2014-10-14 19:05:23.135960 7f8b73fff700 10 filestore(/var/lib/ceph/osd/ceph-0) sync_entry commit took 15.503795, interval was 15.866245 Less than 0.3s on SMALL: 2014-10-14 19:02:21.889467 7fd094be4700 10 filestore(/var/lib/ceph/osd/ceph-2) sync_entry commit took 0.135268, interval was 0.276440 Following checkpoints are much faster on BIG later on though, next one: 2014-10-14 19:05:28.426263 7f8b73fff700 10 filestore(/var/lib/ceph/osd/ceph-0) sync_entry commit took 0.969734, interval was 0.979822 and they seem to converge towards ~0.25s later SMALL seems to converge towards ~0.07s (there is ~1/3 the data and probably datastructures on it though so if snapshots on Btrfs are supposed to be o(n) operations it could be normal). I couldn't find other significant differences: the different phases I identified in the OSD init process took more time on BIG but never more than ~3x compared to SMALL. It seems most of the time is spent doing or accessing snapshots. My best guess currently is that Btrfs snapshot operations may have seen significant speedups between 3.12.21 and 3.17.0 and that OSD init is checkpoint(/snapshot) intensive which makes for most of the slow startup. Current plan: wait at least a week to study 3.17.0 behavior and upgrade the 3.12.21 nodes to 3.17.0 if all goes well. Best regards, Lionel Bouton |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com