On Tue, May 27, 2014 at 10:30:19PM -0700, Christoph Hellwig wrote: > On Wed, May 28, 2014 at 07:26:53AM +1000, Dave Chinner wrote: > > > Right... maybe I'm not parsing your point. The purpose here is to avoid > > > the trylock entirely. E.g., Indicate that we have already acquired the > > > lock and can proceed with xfs_free_eofblocks(), rather than fail a > > > trylock and skip (which appears to be a potential infinite loop scenario > > > here due to how the AG walking code handles EAGAIN). > > > > I think Christoph's concern here is that we are calling a function > > that can take the iolock while we already hold the iolock. i.e. the > > reason we have to add the anti-deadlock code in the first place. > > Indeed. > Ah, I didn't parse correctly then. Thanks... > > To > > address that, can we restructure xfs_file_buffered_aio_write() such > > that the ENOSPC/EDQUOT flush is done outside the iolock? > > > > >From a quick check, I don't think there is any problem with dropping > > the iolock, doing the flushes and then going all the way back to the > > start of the function again, but closer examination and testing is > > warranted... > I considered this briefly early on, but wasn't sure about whether we should run through the write_checks() bits more than once (e.g., potentially do the eof zeroing, etc., multiple times..?). > I think we'd need some form of early space reservation, otherwise we'd > get non-atomic writes. Time to get those batches write patches out > again.. > So the concern is that multiple writers to an overlapped range could become interleaved? From passing through the code, we hit generic_perform_write(), which iters over the iov in a write_begin/copy_write_end loop. If we hit ENOSPC somewhere in the middle, we'd return what we've written so far. I don't believe the buffered_aio_write() path would see the error unless it was the first attempt at a delayed allocation. IOW, mid-write failure will be a short write vs. an ENOSPC error. It seems like it _might_ be safe to drop and reacquire iolock given these semantics (notwithstanding the write_checks() bits), but I could certainly be missing something... Brian _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs