Re: some thoughts on BlueFS space gift redesign

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Oct 2018, Igor Fedotov wrote:
> > My only concern with an ondisk compat change like this is we break
> > downgrade (e.g., from 12.2.10 to 12.2.9 or whatever).  I think repeating
> > the reconciliation on every startup is a small price to pay to avoid that
> > concern.  Or, maybe we only repeat the reconciliation on mimic and
> > luminous but not on nautilus?  Regardless, I think it is cheap: we've
> > already loaded all the freelist state into memory.  It might not be
> > worth the effort to skip it.
> I'm afraid it wouldn't help - reconciliation is able to recover from DB to 
> BlueFS only. I.e. it assumes DB replica is always valid while BlueFS might be
> incomplete.
> That's not the case for us here.

The "reconciliation" I'm referring to would be the other way around: 
BlueFS is always authoritative, and on BlueStore startup, we compare what 
bluefs reports as it's extents to the bluefs_extents in bluestore and make 
sure they match, and also make sure the freelist correctly shows those 
extents as in-use.

So, if bluefs claimed some extra space, then crashed before bluestore 
committed that fact into rocksdb, then on the next startup we notice and 
mark those extents as in-use and update bluefs_extents.

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux