On 01/24/2017 07:26 PM, Mark Nelson wrote: > > My first thought is that PGs are splitting. You only appear to have > 168PGs for 9 OSDs, that's not nearly enough. Beyond the poor data > distribution and associated performance imbalance, your PGs will split > very quickly since by default PGs start splitting at 320 Objects each. > Typically this is less of a problem with RBD since by default it uses > 4MB objects (and thus there are fewer bigger objects), but with only > 168 PGs you are likely to be heavily splitting by the time you hit > ~218GB of data (make sure to take replication into account). > > Normally PG splitting shouldn't be terribly expensive since it's > basically just reading a directory xattr, readdir on a small > directory, then a bunch of link/unlink operations. When SELinux is > enable it appears that link/unlink might require an xattr read on each > object/file to determine if the link/unlink can happen. That's a ton > of extra seek overhead. On spinning disks this is especially bad with > XFS since subdirs may not be in the same AG as a parent dir, so after > subsequent splits, the directories become fragmented and those reads > happen all over the disk (not as much of a problem with SSDs though). > > Anyway, that's my guess as to what's going on, but it could be > something else. blktrace and/or xfs's kernel debugging stuff would > probably lend some supporting evidence if this is what's going on. > > Mark Hi Mark, You're right, it seems that was the problem. I set more pg_num & pgp_num to every pool and no more blocked requests ! (after few days of backfillings) Maybe monitors should set a warning in Ceph status when this situation occurs, no ? It already exists "too few PGs per OSD" warning, but I never hit it in my cluster. Thank you a lot Mark ! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com