Re: A bug in rolling back to a degraded object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zhiqiang,

On Mon, 1 Jun 2015, Wang, Zhiqiang wrote:
> Hi Sage and all,
> 
> Another bug discovered during the proxy write teuthology testing is when 
> rolling back to a degraded object. This doesn't seem specific to proxy 
> write. I see a scenario like below from the log file:
> 
>  - A rollback op comes in, and is enqueued.
>  - Several other ops on the same object come in, and are enqueued.
>  - The rollback op dispatches, and finds the object which it rollbacks 
> to is degraded, then this op is pushbacked into a list to wait for the 
> degraded object to recover.
>  - The later ops are handled and responded back to client.
>  - The degraded object recovers. The rollback op is enqueued again and 
> finally responded to client.

Yep!
 
> This breaks the op order. A fix for this is to maintain a map to track 
> the <source, destination> pair. And when an op on the source dispatches, 
> if such a pair exists, queue the op in the destination's degraded 
> waiting list. A drawback of this approach is that some entries in the ' 
> waiting_for_degraded_object' list of the destination object may not be 
> actually accessing the destination, but the source. Does this make 
> sense?

Yeah, and I think it's fine for the op to appear in the other object's 
list.  In fact there is already a mechanism in place that does something 
similar: obc->blocked_by.  It was added for the clone operation, which 
nfortunately I don't think it's not exercised in any of our test.. but 
I think it does exactly what you need.  If you set the head's blocked_by 
to the clone (and the clone's blocks set to include the head) then anybody 
trying to write to the head will queue up on the clone's degraded 
list (see the check for this in ReplicatedPG::do_op()).

I think this mostl amounts to making the _rollback_to() method get the 
clone's obc, set up the blocked_by/blocks relationship, start recovery of 
that object immediately, and queue itself on the waiting list.

Does that make sense?
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux