Hi Song, On Mon, 13 Jan 2020, song wrote: > Hi Sage, > > happy new year! > > I am a software engineer from China. Recently I found a issue for fastinfo in Ceph and want to consult you about it. > > In the scenario of EC deployment, suppose we done a peering process for a pg and changed one shard's last_update from lu1(e1'3) to lu2(e1'2) .lu1 was written as fastinfo and lu2 was written as info. After that we restarted this osd and loaded pgs again. when we read pg info from disk, we will find the pg info is lu1 applied to lu2, which becomes incorrect. the true value should be lu2. That may cause the coming peering execute incorrectly and result in unfound objects. > I currently considered below two options: > 1. delete fastinfo when we need to change info; > 2. add extra sequence number to fastinfo and info structure to make it keep them in the right order. > > I am looking forward to hearing your suggestions about this issue and preferred solution. > if you need any more info, please let me know. Ah, that does look like a bug. I've opened a tracker ticket for this, https://tracker.ceph.com/issues/43580 Does that look right? I think the fix is pretty simple: https://github.com/ceph/ceph/pull/32615 Thanks! sage > > > thanks, > Song > > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx