Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Tue, 2 Feb 2016 15:06:35 -0800

On Wed, Jan 06, 2016 at 02:44:15PM +1100, Dave Chinner wrote:
> On Tue, Jan 05, 2016 at 06:04:40PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 05, 2016 at 07:42:26AM -0500, Brian Foster wrote:
> > > On Mon, Jan 04, 2016 at 03:59:51PM -0800, Darrick J. Wong wrote:
> > > > I've temporarily fixed this by adding code that figures out how many blocks we
> > > > need if the reference count btree has to have a unique record for every block
> > > > in the AG and holding that many blocks until either they're allocated to the
> > > > refcount btree or freed at umount time.  Right now it's a temporary fix (if the
> > > > FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
> > > > FS to make a permanent reservation that's recorded on disk somehow.  But that's
> > > > involves writing things to disk + making xfsprogs understand the reservation;
> > > > let's see what people say about the reserved pool idea at all.
> > > > 
> > > > Does that make sense? :)
> > > > 
> > > 
> > > Yep, it sounds sort of like the reserve pool mechanism used to protect
> > > against ENOSPC when freeing blocks. Curious... why are the reserved
> > > blocks lost on fs crash? Wouldn't they be reserved again on the
> > > subsequent mount?
> > 
> > They will, but the pre-crash reservation isn't (yet) written down anywhere on
> > disk.
> 
> Does it need to be? The global reserve pool is not "written down"
> anywhere. When we mount, we pull the reserve from the global free
> space accounting. Hence we given ENOSPC when we've used "total fs
> blocks - reserve pool blocks" in memory, and so if we crash we've
> still got at least that many free blocks on disk. hence on mount we
> re-reserve those blocks in memory and everything is back to the way
> it was prior to the crash.
> 
> I suspect the per-ag code is a bit different, but it should be able
> to work the same way. i.e. when we initialise the per-ag structure,
> we pull the reserve from the free block count in the AG, as well as
> from the global free space count. Then we will get correct global
> ENOSPC detection, as well as leave enough space free in each AG as
> we scan and skip them during allocation...
> 
> As long as the per-ag reservation is restored during mount before we
> do EFI recovery processing (i.e. between the two log recovery
> phases), it should restore the reserve pool to the same size as it
> was before a crash occurred....
> 
> Unless, of course, I'm missing something newly introduced by the
> reflink code...

Technically you were, but I've fixed the reservation code to exist purely as
in-core magic that works more or less how you outlined above.  No more on-disk
artifacts, no more need to write a persistence and recovery mechanism. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs