On Fri, Jan 25, 2019 at 7:28 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > Hi, > > I would like to discuss the concept of lazy file reflink. > The use case is backup of a very large read-mostly file. > Backup application would like to read consistent content from the > file, "atomic read" sort of speak. If it's even a few thousand such files let alone millions, whether XFS or Btrfs, you're talking about a lot of metadata writes (hence I sorta understand the request for lazy+volatile reflink). But this quickly is a metric f ton of data, it's in effect a duplicate of each file's metadata which includes a list of its extents. So in simple cases it can be unwritten, but not in every case can you be sure the operation fits in memory. example from my sysroot: 36.87GiB data extents 1.12GiB filesystem metadata If I reflink copy that whole file system, it translates into 1.12GiB metadata read and then 1.12GiB written. If it's a Btrfs snapshot of the containing subvolumes, it's maybe 128KiB written per snapshot. The reflink copy is only cheap compared to full data copy. It's not that cheap compared to snapshots. It sounds to me like a lazy reflink copy is no longer lazy if it has to write out to disk because it can't all fit in memory, or ends up evicting something in memory that otherwise slows things down. A Btrfs snapshot is cheaper than an LVM thinp snapshot which comes with a need to then mount that snapshot's filesystem in order to do the backup. But if the file system is big enough that there are long mount times, chances are you're talking about a lot of data to backup also, which means a alot of metadata to read and then write out unless you're lucky enough to have gobs of RAM. So *shrug* I'm not seeing a consistent optimization with lazy reflink. It'll be faster if we're not talking about a lot of data in the first place. > I have based my assumption that reflink of a large file may incur > lots of metadata updates on my limited knowledge of xfs reflink > implementation, but perhaps it is not the case for other filesystems? > (btrfs?) and perhaps the current metadata overhead on reflink of a large > file is an implementation detail that could be optimized in the future? The optimum use case is maybe a few hundred big files. Tens of thousands to millions - I think you start creating a lot of competition for memory, and the ensuing consequences. Something has to be evicted. Either the lazy reflink is a lower priority and it functionally becomes a partial or full reflink by writing out to block devices; or it's a higher priority and kicks something else out. No free lunch. -- Chris Murphy