On Wed, Mar 27, 2019 at 11:44 AM Kirill Smelkov <kirr@xxxxxxxxxx> wrote: > > A FUSE filesystem server queues /dev/fuse sys_read calls to get > filesystem requests to handle. It does not know in advance what would be > that request as it can be anything that client issues - LOOKUP, READ, > WRITE, ... Many requests are short and retrieve data from the > filesystem. However WRITE and NOTIFY_REPLY write data into filesystem. > > Before getting into operation phase, FUSE filesystem server and kernel > client negotiate what should be the maximum write size the client will > ever issue. After negotiation the contract in between server/client is > that the filesystem server then should queue /dev/fuse sys_read calls with > enough buffer capacity to receive any client request - WRITE in > particular, while FUSE client should not, in particular, send WRITE > requests with > negotiated max_write payload. FUSE client in kernel and > libfuse historically reserve 4K for request header. This way the > contract is that filesystem server should queue sys_reads with > 4K+max_write buffer. > > If the filesystem server does not follow this contract, what can happen > is that fuse_dev_do_read will see that request size is > buffer size, > and then it will return EIO to client who issued the request but won't > indicate in any way that there is a problem to filesystem server. > This can be hard to diagnose because for some requests, e.g. for > NOTIFY_REPLY which mimics WRITE, there is no client thread that is > waiting for request completion and that EIO goes nowhere, while on > filesystem server side things look like the kernel is not replying back > after successful NOTIFY_RETRIEVE request made by the server. > > -> We can make the problem easy to diagnose if we indicate via error > return to filesystem server when it is violating the contract. > This should not practically cause problems because if a filesystem > server is using shorter buffer, writes to it were already very likely to > cause EIO, and if the filesystem is read-only it should be too following > 8K minimum buffer size (= either FUSE_MIN_READ_BUFFER, see 1d3d752b47, > or = 4K + min(max_write)=4k cared to be so by process_init_reply). > > Please see [1] for context where the problem of stuck filesystem was hit > for real (because kernel client was incorrectly sending more than > max_write data with NOTIFY_REPLY; see also previous patch), how the > situation was traced and for more involving patch that did not make it > into the tree. > > [1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2 Applied. Thanks, Miklos