On Tue, Sep 22, 2015 at 7:24 AM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Is there some way to tell in the logs that this is happening? You can search for the (mangled) name _split_collection > I'm not > seeing much I/O, CPU usage during these times. Is there some way to > prevent the splitting? Is there a negative side effect to doing so? Bump up the split and merge thresholds. You can search the list for this, it was discussed not too long ago. > We've had I/O block for over 900 seconds and as soon as the sessions > are aborted, they are reestablished and complete immediately. > > The fio test is just a seq write, starting it over (rewriting from the > beginning) is still causing the issue. I was suspect that it is not > having to create new file and therefore split collections. This is on > my test cluster with no other load. Hmm, that does make it seem less likely if you're really not creating new objects, if you're actually running fio in such a way that it's not allocating new FS blocks (this is probably hard to set up?). > > I'll be doing a lot of testing today. Which log options and depths > would be the most helpful for tracking this issue down? If you want to go log diving "debug osd = 20", "debug filestore = 20", "debug ms = 1" are what the OSD guys like to see. That should spit out everything you need to track exactly what each Op is doing. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com