promote/copy-from and flush vs scrub

Sage Weil <sage@xxxxxxxxxxx> · Mon, 20 Jan 2014 15:25:13 -0800 (PST)

I hit an assert in some of my manual fiddling with the tiering agent where 
a promote, copy-from, or flush operation would hit

  -546> 2014-01-20 14:44:55.506594 7f0a467fc700 -1 osd/ReplicatedPG.cc: In 
function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int)' 
thread 7f0a467fc700 time 2014-01-20 14:44:55.466752
osd/ReplicatedPG.cc: 4737: FAILED assert(soid < scrubber.start || soid >= 
scrubber.end)

The problem is that we normally block ops that hit the current scrub chunk 
in do_op at the very top level, but all of these ops are initiating writes 
at lower levels after doing some other slow/blocking work.

They are smart enough to take and block on the rwlock stuff, but I'm 
worried that the waiting_for_active queue is too coarse for this.  My 
thought is to add a std::map of obc's blocked on scrub and wake them up 
when the scrub chunk completes.  Any better ideas?

Incidentally, we need to make the thrashosds task smart enough to pause 
thrashing long enough for scrubs to happen so that these issues are 
covered in the normal test suite...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html