Re: [PATCH v7 0/5] Introduce provisioning primitives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 02 2023 at  2:44P -0400,
Sarthak Kukreti <sarthakkukreti@xxxxxxxxxxxx> wrote:

> On Tue, May 30, 2023 at 8:28 AM Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> >
> > On Tue, May 30 2023 at 10:55P -0400,
> > Joe Thornber <thornber@xxxxxxxxxx> wrote:
> >
> > > On Tue, May 30, 2023 at 3:02 PM Mike Snitzer <snitzer@xxxxxxxxxx> wrote:
> > >
> > > >
> > > > Also Joe, for you proposed dm-thinp design where you distinquish
> > > > between "provision" and "reserve": Would it make sense for REQ_META
> > > > (e.g. all XFS metadata) with REQ_PROVISION to be treated as an
> > > > LBA-specific hard request?  Whereas REQ_PROVISION on its own provides
> > > > more freedom to just reserve the length of blocks? (e.g. for XFS
> > > > delalloc where LBA range is unknown, but dm-thinp can be asked to
> > > > reserve space to accomodate it).
> > > >
> > >
> > > My proposal only involves 'reserve'.  Provisioning will be done as part of
> > > the usual io path.
> >
> > OK, I think we'd do well to pin down the top-level block interfaces in
> > question. Because this patchset's block interface patch (2/5) header
> > says:
> >
> > "This patch also adds the capability to call fallocate() in mode 0
> > on block devices, which will send REQ_OP_PROVISION to the block
> > device for the specified range,"
> >
> > So it wires up blkdev_fallocate() to call blkdev_issue_provision(). A
> > user of XFS could then use fallocate() for user data -- which would
> > cause thinp's reserve to _not_ be used for critical metadata.
> >
> > The only way to distinquish the caller (between on-behalf of user data
> > vs XFS metadata) would be REQ_META?
> >
> > So should dm-thinp have a REQ_META-based distinction? Or just treat
> > all REQ_OP_PROVISION the same?
> >
> I'm in favor of a REQ_META-based distinction. Does that imply that
> REQ_META also needs to be passed through the block/filesystem stack
> (eg. REQ_OP_PROVION + REQ_META on a loop device translates to a
> fallocate(<insert meta flag name>) to the underlying file)?

Unclear, I was thinking your REQ_UNSHARE (tied to fallocate) might be
a means to translate REQ_OP_PROVISION + REQ_META to fallocate and have
it perform the LBA-specific provisioning of Joe's design (referenced
below).

> <bikeshed>
> I think that might have applications beyond just provisioning:
> currently, for stacked filesystems (eg filesystems residing in a file
> on top of another filesystem), even if the upper filesystem issues
> read/write requests with REQ_META | REQ_PRIO, these flags are lost in
> translation at the loop device layer.  A flag like the above would
> allow the prioritization of stacked filesystem metadata requests.
> </bikeshed>

Yes, it could prove useful.

> Bringing the discussion back to this series for a bit, I'm still
> waiting on feedback from the Block maintainers before sending out v8
> (which at the moment, only have a
> s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/g). I believe from the conversation
> most of the above is follow up work, but please let me know if you'd
> prefer I add some of this to the current series!

I need a bit more time to work through various aspects of the broader
requirements and the resulting interfaces that fall out.

Joe's design is pretty compelling because it will properly handle
snapshot thin devices:
https://listman.redhat.com/archives/dm-devel/2023-May/054351.html

Here is my latest status:
- Focused on prototype for thinp block reservation (XFS metadata, XFS
  delalloc, fallocate)
- Decided the "dynamic" (non-LBA specific) reservation stuff (old
  prototype code) is best left independent from Joe's design.  SO 2
  classes of thinp reservation.
  - Forward-ported the old prototype code that Brian Foster, Joe
    Thornber and I worked on years ago.  It needs more careful review
    (and very likely will need fixes from Brian and myself).  The XFS
    changes are pretty intrusive and likely up for serious debate (as
    to whether we even care to handle reservations for user data).
- REQ_OP_PROVISION bio’s with REQ_META will use Joe’s design,
  otherwise data (XFS data and fallocate) will use “dynamic”
  reservation.
  - "dynamic" name is due to the reservation being generic (non-LBA:
    not in terms of an LBA range). Also, in-core only; so the associated
    “dynamic_reserve_count” accounting is reset to 0 every activation. 
  - Fallocate may require stronger guarantees in the end (in which
    case we’ll add a REQ_UNSHARE flag that is selectable from the
    fallocate interface) 
- Will try to share common code, but just sorting out highlevel
  interface(s) still...

I'll try to get a git tree together early next week.  It will be the
forward ported "dynamic" prototype code and your latest v7 code with
some additional work to branch accordingly for each class of thinp
reservation.  And I'll use your v7 code as a crude stub for Joe's
approach (branch taken if REQ_META set).

Lastly, here are some additional TODOs I've noted in code earlier in
my review process:

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 0d9301802609..43a6702f9efe 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1964,6 +1964,26 @@ static void process_provision_bio(struct thin_c *tc, struct bio *bio)
 	struct dm_cell_key key;
 	struct dm_thin_lookup_result lookup_result;
 
+	/*
+	 * FIXME:
+	 * Joe's elegant reservation design is detailed here:
+	 * https://listman.redhat.com/archives/dm-devel/2023-May/054351.html
+	 * - this design, with associated thinp metadata updates,
+	 *   is how provision bios should be handled.
+	 *
+	 * FIXME: add thin-pool flag "ignore_provision"
+	 *
+	 * FIXME: needs provision_passdown support
+	 *        (needs thinp flag "no_provision_passdown")
+	 */
+
+	/*
+	 * FIXME: require REQ_META (or REQ_UNSHARE?) to allow deeper
+	 *        provisioning code that follows? (so that thinp
+	 *        block _is_ fully provisioned upon return)
+	 *        (or just remove all below code entirely?)
+	 */
+
 	/*
 	 * If cell is already occupied, then the block is already
 	 * being provisioned so we have nothing further to do here.




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux