Re: [PATCH v6 3/3] io_uring: enable per-io hinting capability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/24/24 11:24, Kanchan Joshi wrote:
With F_SET_RW_HINT fcntl, user can set a hint on the file inode, and
all the subsequent writes on the file pass that hint value down.
This can be limiting for large files (and for block device) as all the
writes can be tagged with only one lifetime hint value.
Concurrent writes (with different hint values) are hard to manage.
Per-IO hinting solves that problem.

Allow userspace to pass the write hint type and its value in the SQE.
Two new fields are carved in the leftover space of SQE:
	__u8 hint_type;
	__u64 hint_val;

Adding the hint_type helps in keeping the interface extensible for future
use.
At this point only one type TYPE_WRITE_LIFETIME_HINT is supported. With
this type, user can pass the lifetime hint values that are currently
supported by F_SET_RW_HINT fcntl.

The write handlers (io_prep_rw, io_write) process the hint type/value
and hint value is passed to lower-layer using kiocb. This is good for
supporting direct IO, but not when kiocb is not available (e.g.,
buffered IO).

In general, per-io hints take the precedence on per-inode hints.
Three cases to consider:

Case 1: When hint_type is 0 (explicitly, or implicitly as SQE fields are
initialized to 0), this means user did not send any hint. The per-inode
hint values are set in the kiocb (as before).

Case 2: When hint_type is TYPE_WRITE_LIFETIME_HINT, the hint_value is
set into the kiocb after sanity checking.

Case 3: When hint_type is anything else, this is flagged as an error
and write is failed.

Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx>
---
  fs/fcntl.c                    | 22 ----------------------
  include/linux/rw_hint.h       | 24 ++++++++++++++++++++++++
  include/uapi/linux/io_uring.h | 10 ++++++++++
  io_uring/rw.c                 | 21 ++++++++++++++++++++-
  4 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 081e5e3d89ea..2eb78035a350 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -334,28 +334,6 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
  }
  #endif
-static bool rw_hint_valid(u64 hint)
-{
-	BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
-	BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
-	BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
-	BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
-	BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
-	BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
-
-	switch (hint) {
-	case RWH_WRITE_LIFE_NOT_SET:
-	case RWH_WRITE_LIFE_NONE:
-	case RWH_WRITE_LIFE_SHORT:
-	case RWH_WRITE_LIFE_MEDIUM:
-	case RWH_WRITE_LIFE_LONG:
-	case RWH_WRITE_LIFE_EXTREME:
-		return true;
-	default:
-		return false;
-	}
-}
-
  static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
  			      unsigned long arg)
  {
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
index 309ca72f2dfb..f4373a71ffed 100644
--- a/include/linux/rw_hint.h
+++ b/include/linux/rw_hint.h
@@ -21,4 +21,28 @@ enum rw_hint {
  static_assert(sizeof(enum rw_hint) == 1);
  #endif
+#define WRITE_LIFE_INVALID (RWH_WRITE_LIFE_EXTREME + 1)
+
+static inline bool rw_hint_valid(u64 hint)
+{
+	BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
+	BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
+	BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
+	BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
+	BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
+	BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
+
+	switch (hint) {
+	case RWH_WRITE_LIFE_NOT_SET:
+	case RWH_WRITE_LIFE_NONE:
+	case RWH_WRITE_LIFE_SHORT:
+	case RWH_WRITE_LIFE_MEDIUM:
+	case RWH_WRITE_LIFE_LONG:
+	case RWH_WRITE_LIFE_EXTREME:
+		return true;
+	default:
+		return false;
+	}
+}
+
  #endif /* _LINUX_RW_HINT_H */
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 1fe79e750470..e21a74dd0c49 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -98,6 +98,11 @@ struct io_uring_sqe {
  			__u64	addr3;
  			__u64	__pad2[1];
  		};
+		struct {
+			/* To send per-io hint type/value with write command */
+			__u64	hint_val;
+			__u8	hint_type;
+		};
Why is 'hint_val' 64 bits? Everything else is 8 bytes, so wouldn't it
be better to shorten that? As it stands the new struct will introduce
a hole of 24 bytes after 'hint_type'.

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@xxxxxxx                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux