Hello, On Tue, 17 May 2016 12:12:02 +1000 Chris Dunlop wrote: > Hi Christian, > > On Tue, May 17, 2016 at 10:41:52AM +0900, Christian Balzer wrote: > > On Tue, 17 May 2016 10:47:15 +1000 Chris Dunlop wrote: > > Most your questions would be easily answered if you did spend a few > > minutes with even the crappiest test cluster and observing things (with > > atop and the likes). > > You're right of course. I'll set up a test cluster and start > experimenting, which I should have done before asking questions here. > > > To wit, this is a test pool (12) created with 32 PGs and slightly > > filled with data via rados bench: > > --- > > # ls -la /var/lib/ceph/osd/ceph-8/current/ |grep "12\." > > drwxr-xr-x 2 root root 4096 May 17 10:04 12.13_head > > drwxr-xr-x 2 root root 4096 May 17 10:04 12.1e_head > > drwxr-xr-x 2 root root 4096 May 17 10:04 12.b_head > > # du -h /var/lib/ceph/osd/ceph-8/current/12.13_head/ > > 121M /var/lib/ceph/osd/ceph-8/current/12.13_head/ > > --- > > > > After increasing that to 128 PGs we get this: > > --- > > # ls -la /var/lib/ceph/osd/ceph-8/current/ |grep "12\." > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.13_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.1e_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.2b_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.33_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.3e_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.4b_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.53_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.5e_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.6b_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.73_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.7e_head > > drwxr-xr-x 2 root root 4096 May 17 10:18 12.b_head > > # du -h /var/lib/ceph/osd/ceph-8/current/12.13_head/ > > 25M /var/lib/ceph/osd/ceph-8/current/12.13_head/ > > --- > > > > Now this was fairly uneventful even on my crappy test cluster, given > > the small amount of data (which was mostly cached) and the fact that > > it's idle. > > > > However consider this with 100's of GB per PG and a busy cluster and > > you get the idea where massive and very disruptive I/O comes from. > > Per above, I'll experiment with this, but my first thought is I suspect > that's moving object/data files around rather than copying data, so the > overheads are in directory operations rather than data copies - not that > directory operations are free either of course. > That's correct, but given enough objects (and thus directory depths) and most of all I/O contention in a busy cluster the impact is quite pronounced. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com