On Fri, Sep 07, 2018 at 11:39:25AM +0300, Nikolay Borisov wrote: > > > On 7.09.2018 10:39, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@xxxxxx> > > > > Implement the swap file a_ops on Btrfs. Activation needs to make sure > > that the file can be used as a swap file, which currently means it must > > be fully allocated as nocow with no compression on one device. It must > > also do the proper tracking so that ioctls will not interfere with the > > swap file. Deactivation clears this tracking. > > > > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > > --- > > fs/btrfs/inode.c | 316 +++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 316 insertions(+) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 9357a19d2bff..55aba2d7074c 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -27,6 +27,7 @@ > > #include <linux/uio.h> > > #include <linux/magic.h> > > #include <linux/iversion.h> > > +#include <linux/swap.h> > > #include <asm/unaligned.h> > > #include "ctree.h" > > #include "disk-io.h" > > @@ -10437,6 +10438,319 @@ void btrfs_set_range_writeback(struct extent_io_tree *tree, u64 start, u64 end) > > } > > } > > > > +/* > > + * Add an entry indicating a block group or device which is pinned by a > > + * swapfile. Returns 0 on success, 1 if there is already an entry for it, or a > > + * negative errno on failure. > > + */ > > +static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr, > > + bool is_block_group) > > +{ > > + struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; > > + struct btrfs_swapfile_pin *sp, *entry; > > + struct rb_node **p; > > + struct rb_node *parent = NULL; > > + > > + sp = kmalloc(sizeof(*sp), GFP_NOFS); > > + if (!sp) > > + return -ENOMEM; > > + sp->ptr = ptr; > > + sp->inode = inode; > > + sp->is_block_group = is_block_group; > > + > > + spin_lock(&fs_info->swapfile_pins_lock); > > + p = &fs_info->swapfile_pins.rb_node; > > + while (*p) { > > + parent = *p; > > + entry = rb_entry(parent, struct btrfs_swapfile_pin, node); > > + if (sp->ptr < entry->ptr || > > + (sp->ptr == entry->ptr && sp->inode < entry->inode)) { > > + p = &(*p)->rb_left; > > + } else if (sp->ptr > entry->ptr || > > + (sp->ptr == entry->ptr && sp->inode > entry->inode)) { > > + p = &(*p)->rb_right; > > + } else { > > + spin_unlock(&fs_info->swapfile_pins_lock); > > + kfree(sp); > > + return 1; > > + } > > > I have to admit this is creative use of pointers but I dislike it: > > 1. You are not really doing an interval tree of any sorts so rb seems a > bit of an overkill. How many block groups/devices do you expect to have > in the rb tree i.e how many swap files per file system so that the logn > search behavior really matter? Why not a simple linked list and just an > equality comparison of pointers? We know there's at least one block group per gigabyte of swap, but there can be many more if the file is fragmented. We could probably get away with a linked list in most cases, but the number of entries is only bounded by the size of the filesystem * number of swapfiles. With a linked list, checking n block groups for balance becomes O(n^2) instead of O(n * log n), so I'd rather be on the safe side here. The rbtree manipulation isn't that much more complicated than using a linked list, after all. > 2. The code self-admits that using pointers for lt/gr comparison is a > hack since in case pointers match you fall back to checking the inode > pointer It's not a fallback. A block group or device can contain more than one swapfile, so it's really a separate entry. > 3. There is a discrepancy between the keys used for adding (ptr + inode) > and deletion (just inode) Well, yeah, we add an entry that says block group/device X is pinned by inode Y, and we delete all of the entries pinned by inode Y. > At the very least this hack needs to be at least mentioned in the > changelog. I'll add these comments: diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index e37ce40db380..1c258ee4be24 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -719,6 +719,11 @@ struct btrfs_delayed_root; /* * Block group or device which contains an active swapfile. Used for preventing * unsafe operations while a swapfile is active. + * + * These are sorted on (ptr, inode) (note that a block group or device can + * contain more than one swapfile). We compare the pointer values because we + * don't actually care what the object is, we just need a quick check whether + * the object exists in the rbtree. */ struct btrfs_swapfile_pin { struct rb_node node; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 55aba2d7074c..e103e81c6533 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10481,6 +10481,7 @@ static int btrfs_add_swapfile_pin(struct inode *inode, void *ptr, return 0; } +/* Free all of the entries pinned by this swapfile. */ static void btrfs_free_swapfile_pins(struct inode *inode) { struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 514932c47bcd..062ad86358ad 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7545,6 +7545,10 @@ int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info) return ret; } +/* + * Check whether the given block group or device is pinned by any inode being + * used as a swapfile. + */ bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr) { struct btrfs_swapfile_pin *sp;