On Fri, Nov 12, 2021 at 08:47:30AM +0530, Pavan Kondeti wrote: > Hi Greg, > > On Thu, Nov 11, 2021 at 02:12:28PM +0100, Greg Kroah-Hartman wrote: > > On Thu, Nov 11, 2021 at 05:45:56PM +0530, Pavankumar Kondeti wrote: > > > Function fs endpoint files does not have the notion of file position. > > > So switch to stream like functionality. This allows concurrent threads > > > to be blocked in the ffs read/write operations which use ffs_mutex_lock(). > > > The ffs mutex lock deploys interruptible wait. Otherwise, threads are > > > blocking for the mutex lock in __fdget_pos(). For whatever reason, ff the > > > host does not send/receive data for longer time, hung task warnings > > > are observed. > > > > So the current code is broken? What commit caused it to break? > > This is not a serious bug that can affect functionality. if hung_task_panic > sysctl is not enabled, probably nobody would notice this except an obscure > warning in the kernel dmesg log. It is all about the task state while > it is blocked for I/O. The function fs code uses interruptible wait but > we are not reaching there and getting blocked at VFS layer due to the below > commit introduced from 3.14 kernel. > > commit 9c225f2655e36a470c4f58dbbc99244c5fc7f2d4 > Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Date: Mon Mar 3 09:36:58 2014 -0800 > > vfs: atomic f_pos accesses as per POSIX > > Our write() system call has always been atomic in the sense that you get > the expected thread-safe contiguous write, but we haven't actually > guaranteed that concurrent writes are serialized wrt f_pos accesses, so > threads (or processes) that share a file descriptor and use "write()" > concurrently would quite likely overwrite each others data. > > We have uncovered this issue via customer bug report which happens very rarely. > It only happens like when host does not pull the data for a very long time. > Since function fs does not care about file position, thought stream_open() > is the right thing to do here. > > > > > Doesn't this change cause a change in behavior for existing userspace > > tools, or will they still work as-is? > > > > I don't think it affects user space as it just changes the task state from > UNINTERRUPTIBLE to INTERRUPTIBLE while waiting for the USB transfers to > finish. However there is a slight change to the O_NONBLOCK behavior. > Earlier threads that are using O_NONBLOCK are also getting blocked > inside fdget_pos(). Now they reach to f_fs and error code is returned. IOW, > we are actually fixing the non blocking behavior here. > > PS: I believe you asked these questions since the commit description does not > cover it. I can happily add all this information to it. Since it is all historical, > I did not mention it. Please add all of this to the commit log description so that we can properly understand it in the future. thnaks, greg k-h