Re: [RFC] unifying write variants for filesystems

Anton Altaparmakov <anton@xxxxxxxxxx> · Sun, 2 Feb 2014 23:16:10 +0000

Hi Al,

On 1 Feb 2014, at 22:43, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> * ntfs_file_buffered_write() should switch to iov_iter as well.  It's
> open-coding a lot of iov_iter stuff.

The NTFS code predates iov_iter by more than 10 years so it simply wasn't there to use when I wrote NTFS...  But yes, NTFS should be updated to use it, I agree completely.

> It's not entirely trivial and
> I'd really like to hear from ntfs folks on that, though, and the current
> code looks deadlock-prone.  We prefault everything, then lock the pages
> to which we'll be copying, then attempt to do __copy_from_user_inatomic(),
> falling back to __copy_from_user() if that fails.  Fine, but what happens
> if the source of write() is mmaped from the same file and we lose CPU while
> prefaulting the last page, have memory pressure evict the first one, then
> have __ntfs_grab_cache_pages() allocate and lock (nonuptodate) pages
> we'll be copying to and have __copy_from_user() try to copy *from* those
> same pages?  We are doing it while holding these pages locked, so pagefault
> will have really fun time with them...  Anton?

That would deadlock.  You are (of course) quite right.  But the generic file write code used to suffer from the same deadlock problem and no-one ever complained much about it IIRC...

In the current kernel the problem does not exist any more (due to the do the atomic copy with pagefaults disabled and then retrying with just the first segment and keeping retrying till it works) so indeed updating NTFS to use iov_iter would be a good opportunity to get that fixed in NTFS as well.

It does not matter if for NTFS it turns out to be quite inefficient (due to locking/unlocking several pages at once) as it is such a crazy corner case that will hardly ever be triggered in real life especially so as the only time NTFS operates on more than one page at once in ntfs_file_buffered_write() is when a write happens into a sparse logical block (NTFS "cluster") and only on volumes with a logical block size > PAGE_CACHE_SIZE which with a minimum PAGE_CACHE_SIZE of 4096 bytes on Linux and the fact that Windows only ever creates volumes with a logical block size of 4096 (unless the user specifically forces a different block size when creating the volume) it means that it basically hardly ever happens.  Also, the NTFS kernel driver never creates sparse files, i.e. the sparse file would have had to come from Windows or another NTFS implementation...

I will have a look at moving NTFS to iov_iter and fixing the potential deadlock at the same time.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge
J.J. Thomson Avenue, Cambridge, CB3 0RB, UK

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs