Hi Wade, (Apologies for the slowness - AFK for the weekend). On 16 June 2016 at 23:38, Wido den Hollander <wido@xxxxxxxx> wrote: > >> Op 16 juni 2016 om 14:14 schreef Wade Holler <wade.holler@xxxxxxxxx>: >> >> >> Hi All, >> >> I have a repeatable condition when the object count in a pool gets to >> 320-330 million the object write time dramatically and almost >> instantly increases as much as 10X, exhibited by fs_apply_latency >> going from 10ms to 100s of ms. >>r filestore > > My first guess is the filestore splitting and the amount of files per directory. I concur with Wido and suggest you try upping your filestore split and merge threshold config values. I've seen this issue a number of times now with write heavy workload, and would love to at least write some docs about it, because it must happen to a lot of users running RBD workloads on largish drives. However, I'm not sure how to definitively diagnose the issue and pinpoint the problem. The gist of the issue is the number of files and/or directories on your OSD filesystems, at some system dependent threshold you get to a point where you can no longer sufficiently cache inodes and/or dentrys, so IOs on those files(ystems) have to incur extra disk IOPS to read the filesystem structure from disk (I believe that's the small read IO you're seeing, and unfortunately it seems to effectively choke writes - we've seen all sorts of related slow request issues). If you watch your xfs stats you'll likely get further confirmation. In my experience xs_dir_lookups balloons (which means directory lookups are missing cache and going to disk). What I'm not clear on is whether there are two different pathologies at play here, i.e., specifically dentry cache issues versus inode cache issues. In the former case making Ceph's directory structure shallower with more files per directory may help (or perhaps increasing the number of PGs - more top-level directories), but in the latter case you're likely to need various system tuning (lower vfs cache pressure, more memory?, fewer files (larger object size)) depending on your workload. -- Cheers, ~Blairo -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html