Nice catch. It is deadlocked in a code path between handling a watch/notify error from librados and flushing the internal cache. The issue already appears to exist in the hammer branch. -- Jason Dillaman Red Hat dillaman@xxxxxxxxxx http://www.redhat.com ----- Original Message ----- From: "Loic Dachary" <loic@xxxxxxxxxxx> To: "Jason Dillaman" <dillaman@xxxxxxxxxx> Cc: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx> Sent: Friday, May 8, 2015 5:07:49 AM Subject: long running workloads/rbd_fsx_cache_writeback.yaml on hammer Hi Janson, There is a long running (12h +) job at http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/878799/ which is about rbd/thrash/{base/install.yaml clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/default.yaml workloads/rbd_fsx_cache_writeback.yaml} and runs on the current hammer-backports branch which contains a few librbd backports http://tracker.ceph.com/issues/11492#Teuthology-run-commit-commita79146fc3cae28bf4c07478fb4566b06942da60d-hammer-backports-branch-May-2015 I don't see an obvious cause for error in the logs. Does it ring a bell ? Is it supposed to take that long ? Note that all other jobs completed successfully (see http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/ for details). Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html