On Wed, Apr 24, 2019 at 12:48:36PM +0200, Miklos Szeredi wrote: > On Wed, Mar 27, 2019 at 11:44 AM Kirill Smelkov <kirr@xxxxxxxxxx> wrote: > > > > A FUSE filesystem server queues /dev/fuse sys_read calls to get > > filesystem requests to handle. It does not know in advance what would be > > that request as it can be anything that client issues - LOOKUP, READ, > > WRITE, ... Many requests are short and retrieve data from the > > filesystem. However WRITE and NOTIFY_REPLY write data into filesystem. > > > > Before getting into operation phase, FUSE filesystem server and kernel > > client negotiate what should be the maximum write size the client will > > ever issue. After negotiation the contract in between server/client is > > that the filesystem server then should queue /dev/fuse sys_read calls with > > enough buffer capacity to receive any client request - WRITE in > > particular, while FUSE client should not, in particular, send WRITE > > requests with > negotiated max_write payload. FUSE client in kernel and > > libfuse historically reserve 4K for request header. This way the > > contract is that filesystem server should queue sys_reads with > > 4K+max_write buffer. > > > > If the filesystem server does not follow this contract, what can happen > > is that fuse_dev_do_read will see that request size is > buffer size, > > and then it will return EIO to client who issued the request but won't > > indicate in any way that there is a problem to filesystem server. > > This can be hard to diagnose because for some requests, e.g. for > > NOTIFY_REPLY which mimics WRITE, there is no client thread that is > > waiting for request completion and that EIO goes nowhere, while on > > filesystem server side things look like the kernel is not replying back > > after successful NOTIFY_RETRIEVE request made by the server. > > > > -> We can make the problem easy to diagnose if we indicate via error > > return to filesystem server when it is violating the contract. > > This should not practically cause problems because if a filesystem > > server is using shorter buffer, writes to it were already very likely to > > cause EIO, and if the filesystem is read-only it should be too following > > 8K minimum buffer size (= either FUSE_MIN_READ_BUFFER, see 1d3d752b47, > > or = 4K + min(max_write)=4k cared to be so by process_init_reply). > > > > Please see [1] for context where the problem of stuck filesystem was hit > > for real (because kernel client was incorrectly sending more than > > max_write data with NOTIFY_REPLY; see also previous patch), how the > > situation was traced and for more involving patch that did not make it > > into the tree. > > > > [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 > > Applied. Thanks. Looking forward for it to appear in fuse.git#for-next Kirill