On Thu, Apr 9, 2015 at 8:14 AM, Jacob Reid <lists-ceph@xxxxxxxxxxxxxxxx> wrote: > On Thu, Apr 09, 2015 at 06:43:45AM -0700, Gregory Farnum wrote: >> You can turn up debugging ("debug osd = 10" and "debug filestore = 10" >> are probably enough, or maybe 20 each) and see what comes out to get >> more information about why the threads are stuck. >> >> But just from the log my answer is the same as before, and now I don't >> trust that controller (or maybe its disks), regardless of what it's >> admitting to. ;) >> -Greg >> > > Ran with osd and filestore debug both at 20; still nothing jumping out at me. Logfile attached as it got huge fairly quickly, but mostly seems to be the same extra lines. I tried running some test I/O on the drives in question to try and provoke some kind of problem, but they seem fine now... Okay, this is strange. Something very wonky is happening with your scheduler — it looks like these threads are all idle, and they're scheduling wakeups that handle an appreciable amount of time after they're supposed to. For instance: 2015-04-09 15:56:55.953116 7f70a7963700 20 filestore(/var/lib/ceph/osd/osd.15) sync_entry woke after 5.416704 2015-04-09 15:56:55.953153 7f70a7963700 20 filestore(/var/lib/ceph/osd/osd.15) sync_entry waiting for max_interval 5.000000 This is the thread that syncs your backing store, and it always sets itself to get woken up at 5-second intervals — but here it took >5.4 seconds, and later on in your log it takes more than 6 seconds. It looks like all the threads which are getting timed out are also idle, but are taking so much longer to wake up than they're set for that they get a timeout warning. There might be some bugs in here where we're expecting wakeups to be more precise than they can be, but these sorts of misses are definitely not normal. Is this server overloaded on the CPU? Have you done something to make the scheduler or wakeups wonky? -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com