Modifying and fixing(?) the per-inode snap handling in ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Ilya, Xiubo, Greg,

I'm trying to finish my patches to make ceph work with netfslib and I'm
wondering if snap handling on inodes can be made easier to work with.  Also, I
think there may be a bug in the interaction between ceph_queue_cap_snap() and
writable mmaps.

What I would like to do is to make page/folio->private point at the
ceph_cap_snap struct instead of pointing to ceph_snap_context.  This makes it
easier to fish the metadata details out in ceph when netfslib asks it to
perform a write operation.

Netfslib has the capability to pass an netfs_group struct through the API, and
I currently have this subclassed by ceph_snap_context, but that doesn't
directly carry sufficient information as I presume that's a global thing and
not an inode-specific thing.

However, it looks like capsnaps don't always exist, even on dirty inodes...

So what I'm thinking is:

 (1) Make struct ceph_cap_snap a subclass of netfs_group.  This would allow
     netfslib to manipulate them and attach them to dirty pages and do
     selective writeback.

 (2) Always keep a ceph_cap_snap on a dirty inode.  It can be treated
     specially when it's the only snap and at the head.

 (3) Offload some of the fields from ceph_inode_info into ceph_cap_snap
     (eg. truncate_size and truncate_seq) and update them directly there.

 (4) On entry to any sort of write routine, see if we need a new capsnap for
     that inode and, if so, create one.  This would include ->write_iter(),
     ->page_mkwrite(), ->setattr(), possibly ->setxattr(),

 (5) In queue_realm_cap_snaps(), mark the capsnap as being obsolete and call
     unmap_mapping_pages() on each inode to force ->page_mkwrite() to be
     called[!] on further modification.

     queue_realm_cap_snaps() doesn't then need to create a new snapcap; this
     can be left to the various write routines.

     [!] This would fix the aforementioned potential bug whereby someone can
     continue writing to the inode even though a new snap has happened.

 (6) ceph_writepages() calls netfs_writepages_group() to flush out pages with
     the matching group, stepping through the capsnap list on the inode.

Any thoughts on whether this would work?  If I can do this, I can reduce
get_oldest_context() to almost nothing and don't need the ceph_writeback_ctl
struct anymore (I think).

Thanks,
David





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux