Re: [RESEND, PATCH v2] fuse: Don't drop NOTIFY_REPLY if we promised it

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Miklos, first of all thanks for feedback.

On Tue, Feb 26, 2019 at 04:14:22PM +0100, Miklos Szeredi wrote:
> On Tue, Feb 19, 2019 at 10:42 AM Kirill Smelkov <kirr@xxxxxxxxxx> wrote:
> >
> > A successful call to NOTIFY_RETRIEVE by filesystem carries promise from
> > the kernel to send back NOTIFY_REPLY message. However if the filesystem
> > is not reading requests with fuse_conn->max_pages capacity,
> 
> That's a violation of the contract by the fuse server, not the kernel.

Do you mean that even if filesystem server configures via
init_out.max_write that it is accepting e.g. only 32K max writes, it
still has to be issuing sys_read with buffer of 128K (= hardcoded
fuse_conn->max_pages before Linux 4.20, and default since Linux 4.20)?

Also, I could not find any FUSE contract being specified anywhere, so I
used message of the commit that added support for NOTIFY_RETRIEVE to sense its
semantic:

	commit 2d45ba381a74a743eeaa2b06c7c5c0d2bf73ba1a
	Author: Miklos Szeredi <mszeredi@xxxxxxx>
	Date:   Mon Jul 12 14:41:40 2010 +0200
	
	    fuse: add retrieve request
	    
	    Userspace filesystem can request data to be retrieved from the inode's
	    mapping.  This request is synchronous and the retrieved data is queued
	    as a new request.  If the write to the fuse device returns an error
	    then the retrieve request was not completed and a reply will not be
	    sent.
	    
	    Only present pages are returned in the retrieve reply.  Retrieving
	    stops when it finds a non-present page and only data prior to that is
	    returned.
	    
	    This request doesn't change the dirty state of pages.
	    
	    Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxx>
	
which, even if not explicitly, gives the impression that if NOTIFY_RETRIEVE was
queued successfully, the reply will come.

Also: if it is a violation of contract by filesystem server, the kernel should
return ESOMETHING for violating sys_read, instead of making that read to
be waiting indefinitely, isn't it?

In summary: instead of getting clients stuck silently I still suggest
for NOTIFY_RETRIEVE to come to client, if it can come. And also to
return EINVAL for /dev/fuse sys_read calls that are violating the
server/kernel contract.


> > fuse_dev_do_read might see that the "request is too large" and decide to
> > "reply with an error and restart the read". "Reply with an error" has
> > underlying assumption that there is a "requester thread" that is waiting
> > for request completion, which is true for most requests, but is not true
> > for NOTIFY_REPLY: NOTIFY_RETRIEVE handler completes with OK status right
> > after it could successfully queue NOTIFY_REPLY message without waiting
> > for NOTIFY_REPLY completion. This leads to situation when filesystem
> > requested to retrieve inode data with NOTIFY_RETRIEVE, got err=OK for
> > that notification request, but NOTIFY_REPLY is not coming back.
> >
> > More, since there is no "requester thread" to handle the error, the
> > situation shows itself as /sys/fs/fuse/connections/X/waiting=1 _and_
> > /dev/fuse read(s) queued. Which is misleading since NOTIFY_REPLY request
> > was removed from pending queue and abandoned.
> 
> Now I don't understand how that would happen.   If the request is
> abandoned, its refcount should go down to zero and the num_waiting
> count decremented accordingly.

You are right - it was my mistake. I misinterpreted waiting=1 as a
request not being transferred to filesystem server yet, but in my test
it turned out to be already transferred and sitting on client "processing" list
waiting for corresponding reply (which was not coming because the filesystem in
turn was stuck waiting for NOTIFY_REPLY to come from the kernel):

	root@(none):~/src/wendelin/wendelin.core/wcfs# cat /sys/fs/fuse/connections/25/waiting
	1
	
	root@(none):~/src/wendelin/wendelin.core/wcfs# cat /sys/fs/fuse/connections/25/queue
	#waiting: 1
	(0)Interrupt:
	(0)Forget:
	(0)Request:
	(0)Processing.IO:
	(1)Processing.P:
	#0: .52 R15 i5


So the part of commit message that discussed X/waiting=1 is wrong and should be
dropped - thanks for catching this.


Still the main point is what should be the semantic of NOTIFY_RETRIEVE
vs NOTIFY_REPLY vs INIT.max_write, and that it is better to always send
retrieve data if client promised it, and also to explicitly indicate
with an error if filesystem server is violating FUSE server/client
contract.

Thanks,
Kirill


P.S. I attach the draft patch for /sys/fs/fuse/connections/X/queue in
case someone is interested.

---- 8< ----

diff --git a/fs/fuse/control.c b/fs/fuse/control.c
index fe80bea4ad89..f4e22f5436e2 100644
--- a/fs/fuse/control.c
+++ b/fs/fuse/control.c
@@ -63,6 +63,160 @@ static ssize_t fuse_conn_waiting_read(struct file *file, char __user *buf,
 	return simple_read_from_buffer(buf, len, ppos, tmp, size);
 }
 
+/* fuse_conn_iqueue_print prints input queue into provided buf.
+ *
+ * buf can be NULL in which case only the length of would-be printed text is
+ * returned and nothing is actually printed.
+ *
+ * must be called with fc->iq->waitq locked. */
+static size_t fuse_conn_iqueue_print(char *buf, size_t size, struct fuse_conn *fc)
+{
+	struct fuse_iqueue *fiq = &fc->iq;
+	struct fuse_req *req;
+	struct fuse_forget_link *freq;
+	size_t nreq;
+
+	size_t __n, __total = 0;
+#define emitf(FORMAT, ...) do {	\
+	__n = snprintf(buf, size, FORMAT, __VA_ARGS__);	\
+	__total += __n;					\
+	if (buf) {					\
+		size -= __n;				\
+		buf  += __n;				\
+	}						\
+} while (0)
+
+
+	if (!buf)
+		size = 0;
+
+	// XXX temp
+	emitf("#waiting: %d\n", atomic_read(&fc->num_waiting));
+
+	/* interrupts */
+	nreq = 0;
+	list_for_each_entry(req, &fiq->interrupts, list) {
+		nreq++;
+	}
+
+	emitf("(%lu)Interrupt:\n", nreq);
+
+	nreq = 0;
+	list_for_each_entry(req, &fiq->interrupts, list) {
+		emitf("\t#%lu: R%d\n", nreq, req->in.h.opcode);
+		nreq++;
+	}
+
+	/* forgets */
+	nreq = 0;
+	for (freq = fiq->forget_list_head.next; freq != NULL; freq = freq->next) {
+		nreq++;
+	}
+
+	emitf("(%lu)Forget:\n", nreq);
+
+	nreq = 0;
+	for(freq = fiq->forget_list_head.next; freq != NULL; freq = freq->next) {
+		emitf("\t#%lu: FORGET i%llu -%llu\n", nreq,
+			freq->forget_one.nodeid, freq->forget_one.nlookup);
+		nreq++;
+	}
+
+	/* all other requests */
+	nreq = 0;
+	list_for_each_entry(req, &fiq->pending, list) {
+		nreq++;
+	}
+
+	emitf("(%lu)Request:\n", nreq);
+
+	nreq = 0;
+	list_for_each_entry(req, &fiq->pending, list) {
+		emitf("\t#%lu: R%d\n", nreq, req->in.h.opcode);
+		nreq++;
+	}
+
+	/* processing */
+	// XXX temp? XXX locking
+	{
+	int i;
+	struct fuse_dev *fud;
+	list_for_each_entry(fud, &fc->devices, entry) {
+		struct fuse_pqueue *fpq = &fud->pq;
+
+		nreq = 0;
+		list_for_each_entry(req, &fpq->io, list) {
+			nreq++;
+		}
+		emitf("(%lu)Processing.IO:\n", nreq);
+
+		// XXX print IO elements
+
+		nreq = 0;
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
+			list_for_each_entry(req, &fpq->processing[i], list) {
+				nreq++;
+			}
+		}
+		emitf("(%lu)Processing.P:\n", nreq);
+
+		nreq = 0;
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
+			list_for_each_entry(req, &fpq->processing[i], list) {
+				struct fuse_in_header *h = &req->in.h;
+				emitf("\t#%lu: .%lld R%d i%llu\n", nreq, h->unique, h->opcode, h->nodeid);
+				nreq++;
+			}
+		}
+	}
+	}
+
+	return __total;
+#undef emitf
+}
+
+static ssize_t fuse_conn_iqueue_read(struct file *file, char __user *buf,
+				      size_t len, loff_t *ppos)
+{
+	char *qdump;
+
+	if (!*ppos) {
+		struct fuse_conn *fc = fuse_ctl_file_conn_get(file);
+		struct fuse_iqueue *fiq;
+		size_t n;
+		char *qdump2;
+		if (!fc)
+			return 0;
+
+		fiq = &fc->iq;
+		spin_lock(&fiq->waitq.lock);
+		n = fuse_conn_iqueue_print(NULL, 0, fc);
+		n += 1; /* trailing 0 */
+
+		qdump = kmalloc(n, GFP_ATOMIC);
+		if (qdump) {
+			fuse_conn_iqueue_print(qdump, n, fc);
+		}
+
+		spin_unlock(&fiq->waitq.lock);
+		fuse_conn_put(fc);
+
+		if (!qdump) {
+			return -ENOMEM;
+		}
+
+		/* release atomic memory, since it is scarce resource */
+		qdump2 = kstrdup(qdump, GFP_KERNEL);
+		kfree(qdump);
+
+		file->private_data = (void *)qdump2;
+		// TODO release qdump on file release
+	}
+
+	qdump = (char *)file->private_data;
+	return simple_read_from_buffer(buf, len, ppos, qdump, strlen(qdump));
+}
+
 static ssize_t fuse_conn_limit_read(struct file *file, char __user *buf,
 				    size_t len, loff_t *ppos, unsigned val)
 {
@@ -202,6 +356,12 @@ static const struct file_operations fuse_ctl_waiting_ops = {
 	.llseek = no_llseek,
 };
 
+static const struct file_operations fuse_ctl_queue_ops = {
+	.open = nonseekable_open,
+	.read = fuse_conn_iqueue_read,
+	.llseek = no_llseek,
+};
+
 static const struct file_operations fuse_conn_max_background_ops = {
 	.open = nonseekable_open,
 	.read = fuse_conn_max_background_read,
@@ -278,6 +438,8 @@ int fuse_ctl_add_conn(struct fuse_conn *fc)
 
 	if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1,
 				 NULL, &fuse_ctl_waiting_ops) ||
+	    !fuse_ctl_add_dentry(parent, fc, "queue", S_IFREG | 0400, 1,
+				 NULL, &fuse_ctl_queue_ops) ||
 	    !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1,
 				 NULL, &fuse_ctl_abort_ops) ||
 	    !fuse_ctl_add_dentry(parent, fc, "max_background", S_IFREG | 0600,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0920c0c032a0..7efef59caaa9 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -41,7 +41,7 @@
 #define FUSE_NAME_MAX 1024
 
 /** Number of dentries for each connection in the control filesystem */
-#define FUSE_CTL_NUM_DENTRIES 5
+#define FUSE_CTL_NUM_DENTRIES 6
 
 /** Number of page pointers embedded in fuse_req */
 #define FUSE_REQ_INLINE_PAGES 1



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux