Re: 'Racing read got wrong version' during proxy write testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 25 May 2015, Wang, Zhiqiang wrote:
> Hi all,
> 
> I ran into a problem during the teuthology test of proxy write. It is like this:
> 
> - Client sends 3 writes and a read on the same object to base tier
> - Set up cache tiering
> - Client retries ops and sends the 3 writes and 1 read to the cache tier
> - The 3 writes finished on the base tier, say with versions v1, v2 and v3
> - Cache tier proxies the 1st write, and start to promote the object for the 2nd write, the 2nd and 3rd writes and the read are blocked
> - The proxied 1st write finishes on the base tier with version v4, and returns to cache tier. But somehow the cache tier fails to send the reply due to socket failure injecting
> - Client retries the writes and the read again, the writes are identified as dup ops
> - The promotion finishes, it copies the pg_log entries from the base tier and put it in the cache tier's pg_log. This includes the 3 writes on the base tier and the proxied write
> - The writes dispatches after the promotion, they are identified as completed dup ops. Cache tier replies these write ops with the version from the base tier (v1, v2 and v3)
> - In the last, the read dispatches, it reads the version of the proxied write (v4) and replies to client
> - Client complains that 'racing read got wrong version'
> 
> In a previous discussion of the 'ops not idempotent' problem, we solved it by copying the pg_log entries in the base tier to cache tier during promotion. Seems like there is still a problem with this approach in the above scenario. My first thought is that when proxying the write, the cache tier should use the original reqid from the client. But currently we don't have a way to pass the original reqid from cache to base. Any ideas?

I agree--I think the correct fix here is to make the proxied op be 
recognized as a dup.  We can either do that by passing in an optional 
reqid to the Objecter, or extending the op somehow so that both reqids are 
listed.  I think the first option will be cleaner, but I think we 
will also need to make sure the 'retry' count is preserved as (I think) we 
skip the dup check if retry==0.  And we probably want to preserve the 
behavior that a given (reqid, retry) only exists once in the system.

This probably means adding more optional args to Objecter::read()...?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux