Re: long running workloads/rbd_fsx_cache_writeback.yaml on hammer

Jason Dillaman <dillaman@xxxxxxxxxx> · Fri, 8 May 2015 08:36:23 -0400 (EDT)

Nice catch.  It is deadlocked in a code path between handling a watch/notify error from librados and flushing the internal cache.  The issue already appears to exist in the hammer branch.

-- 

Jason Dillaman 
Red Hat 
dillaman@xxxxxxxxxx 
http://www.redhat.com 

----- Original Message -----
From: "Loic Dachary" <loic@xxxxxxxxxxx>
To: "Jason Dillaman" <dillaman@xxxxxxxxxx>
Cc: "Ceph Development" <ceph-devel@xxxxxxxxxxxxxxx>
Sent: Friday, May 8, 2015 5:07:49 AM
Subject: long running workloads/rbd_fsx_cache_writeback.yaml on hammer

Hi Janson,

There is a long running (12h +) job at

http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/878799/

which is about

 rbd/thrash/{base/install.yaml clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/few.yaml thrashers/default.yaml workloads/rbd_fsx_cache_writeback.yaml} 

and runs on the current hammer-backports branch which contains a few librbd backports

http://tracker.ceph.com/issues/11492#Teuthology-run-commit-commita79146fc3cae28bf4c07478fb4566b06942da60d-hammer-backports-branch-May-2015

I don't see an obvious cause for error in the logs. Does it ring a bell ? Is it supposed to take that long ? Note that all other jobs completed successfully (see http://pulpito.ceph.com/loic-2015-05-07_09:46:27-rbd-hammer-backports---basic-multi/ for details).

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html