Re: bcachefs: can bcachefs export block devices?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 04, 2016 at 04:46:58PM -0700, Eric Wheeler wrote:
> On Wed, 3 Aug 2016, Kent Overstreet wrote:
> > On Fri, May 27, 2016 at 07:45:32PM -0700, Eric Wheeler wrote:
> >
> > Realistically the only way that's going to happen is to completely fork the
> > source code, ext2/3/4 style.
> > Although that's going to have to happen eventually.
> 
> Sure, that makes sense.  At what point would you want to do that rename so 
> bcache-dev can be pulled into the kernel tree?

Probably not until after the on disk format is stabilized for good, which isn't
going to be until after snapshots are at least minimally working.

> > > > The way it works is the first 4096 inode numbers are owned by the block device
> > > > interface - inodes in that range are for either cached devices or thin
> > > > provisioned volumes. The filesystem code owns inode numbers >= 4096.
> > > > 
> > > > So while blockdev volumes/cached data do have inodes, they're not reachable via
> > > > the filesystem because there will never be dirents that point to them (also,
> > > > they use a different inode type with extra fields for the UUID/label).
> > > 
> > > Thats a neat implementation.  Would creating a dirent for such an inode 
> > > expose the block device with the same size and content (and ordering) if 
> > > if the inode were compatable?  Would the blockdev be block-size aligned 
> > > versus the file or might the file have an alignment requirement?
> > 
> > What we'd want to do is add an ioctl or something to take a fs inode (a normal
> > file, that already has a dirent) and create at runtime a block device for it.
> 
> You had mentioned changing on-disk format related to this and NFS support.  
> Is that coming along too?

Yeah, the transactions stuff I wrote about on Patreon is for NFS support. I
think I'll be able to do NFS support without an on disk format change, but I
will need an on disk format change at some point.

> 
> > > I'm particularly excited about this as a precursor to snapshot support, 
> > > especially if udev could help produce something like this:
> > > 
> > >   /dev/disk/by-path/bcache-mydiskfile -> /dev/bcacheN
> > >   /dev/disk/by-path/bcache-mydisksnap -> /dev/bcacheN+1
> > 
> > Not sure what you mean by precursor - that would still require essentially the
> > entire snapshots implementation. But yes, once we have snapshots we could do
> > that too.
> 
> Precursor, as in, export an arbitrary file as a blockdev even if snapshots 
> aren't ready yet.  I can start testing in our testbed once files can be 
> exported as blocks, whether or not they support snapshots.

I may have asked this before, but is loopback really not good enough? I thought
performance of the loopback driver had improved recently (I know Christoph was
working on this).

> Other questions:
> 
> Is FIEMAP supported so uncached fils can be read in disk-linear order?  
> Hmm, I wonder, what does FIEMAP even mean when the file spreads across 
> multiple disks?  Maybe it doesn't apply here.  Really what I'm looking for 
> is a way to list which blocks have changed between two snapshots for easy 
> incremental backups (eg, `btrfs send`).

Yeah, fiemap works. I don't remember if there's any provisions in fiemap for
multiple devices.

We'll have real send/receive at some point though - we've got a version number
field in struct bkey, so we'll have a "send all keys greater than version number
foo".

> I'm excited about checksum support.  If an SSD bitflips, will it fail the 
> whole disk, or just report an error and attempt to re-read from another 
> volume?  

It'll fail that individual IO, and whether or not it fails the entire device
should be configurable (don't believe it is yet).

It should reread from another replica if available but I'm not sure if that's
done yet - I haven't looked at what's left with replication in quite awhile.

> Right now btrfs/zfs is the only viable checksum filesystem with recovery, 
> and there aren't any viable blockdevice checksumming implementations 
> (dm-csum didn't take off and the PoC academic example splicing into md 
> raid isn't really ready either).

Maybe I'll start working more on the replication stuff, it would be nice to get
that stuff finished off... Are you chipping in on Patreon? :)
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux