On Tue, Oct 22, 2013 at 09:41:05PM +0100, Al Viro wrote: > On Tue, Oct 22, 2013 at 06:22:49PM +0100, Al Viro wrote: > > On Sun, Oct 20, 2013 at 11:33:56AM +0100, Phil Davis wrote: > > > > > The reason I think btrfs send is leaking open files is if you watch > > > /proc/sys/fs/file-nr you see the > > > number of open files increasing but if you kill the btrfs send > > > process then the open > > > files count reduces back down. In fact suspending the process also > > > reduces the open file count but > > > resuming it then makes the count start increasing again. > > > > What does lsof show while you are running that? AFAICS, btrfs_ioctl_send() > > should be neutral wrt file references - we do fget() on entry and > > fput() of the result on exit, with nothing else looking relevant in > > sight... OTOH, btrfs-progs number of calls of that ioctl() seems to > > be bounded by the length of argument list. So the interesting questions > > are > > a) how many btrfs send instances are running at the time? > > b) what do their arg lists look like? > > c) who (if anyone) has all those opened files in their descriptor > > tables? > > > aaaarrrgh..... OK, I see what's going on. We have a normal process doing > a normal syscall (well, inasmuch as ioctl(2) can be called normal). It's > just that this syscall happens to open and close an obscene amount of > struct file as it goes. Which leads to all those struct file sitting there > and waiting for task_work_run() to do __fput(). In the meanwhile we keep > allocating new ones (and closing them). All without returning to userland. > > Result: O(regular files in snapshot) struct file instances by the time it > ends. Of course, once we return to userland (or get killed by OOM), > we have task_work_run() called and all those suckers go away. > > Note that decrementing the opened files counter earlier will do nothing > to OOM - it *is* caused by the bloody huge pile of struct file / struct > dentry / struct inode. We really need to flush that shite somehow - or > avoid producing it in the first place. > > The trouble is, I'm not sure that doing __fput() here is really safe - the > call chains are long and convoluted and I don't see what the locking > environment is. IOW, I'm not sure that it's really deadlock-free with > fput() done synchronously. btrfs_release_file() seems to be doing some > non-trivial work if we had the file truncated by somebody else, so... > > Does using vfs_read() in send_write() really make things much simpler for > us? That's the only reason to bother with that dentry_open() at all; > we could bloody well just copy the data from page cache without bothering > with struct file, set_fs(), etc. > > Comments? I agree, we should probably just drop the vfs_read() call and do the hard work ourselves. I will look into doing this in the near future. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html