On Tue, Nov 20, 2012 at 10:27:11PM +1100, Dave Chinner wrote: > This was discovered on a filesystem with a log of only 10MB, and a > log stripe unit of 256k whih increased the base reservations by > 512k. Hence a allocation transaction requires 1.2MB of log space to > be available instead of only 260k, and so greatly increased the > chance that there wouldn't be enough log space available for the > nested transaction to succeed. The key to reproducing it is this > mkfs command: > > mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV > > The test case was a 1000 fsstress processes running with random > freeze and unfreezes every few seconds. Thanks to Eryu Guan > (eguan@xxxxxxxxxx) for writing the test that found this on a system > with a somewhat unique default configuration.... That sounds like something that could fit xfstests fairly easily. Re the patch - you're moving the transaction allocation back into the end_io handler. That's what my original version did, and I'm pretty sure you talked me out of it back then. I can't remember the details but the list should have it. > @@ -151,9 +151,11 @@ xfs_setfilesize( > /* > * The transaction was allocated in the I/O submission thread, > * thus we need to mark ourselves as beeing in a transaction > - * manually. > + * manually. Similarly for freeze protection. > */ > current_set_flags_nested(&tp->t_pflags, PF_FSTRANS); > + rwsem_acquire_read(&VFS_I(ip)->i_sb->s_writers.lock_map[SB_FREEZE_FS-1], > + 0, 1, _THIS_IP_); The comment above isn't true anymore, and the flags hack should be removed. I'm also not sure the freeze protection still works if the acquire is outside the original broader scope protection. I'll defer to Jan on this as I don't really understand this magic enough.q should be removed respectively replaced with sb_start_intwrite/sb_end_intwrite > if (ioend->io_type == XFS_IO_UNWRITTEN) { > error = xfs_iomap_write_unwritten(ip, ioend->io_offset, > + ioend->io_size); > + goto done; > + } > + > + /* > + * For direct I/O we do not know if we need to allocate blocks or not so > + * we can't preallocate an append transaction as that results in nested > + * reservations and log space deadlocks. Hence allocate the transaction > + * here. For buffered I/O we preallocate a transaction when submitting > + * the IO. > + */ > + if (ioend->io_isdirect && xfs_ioend_is_append(ioend)) { xfs_iomap_write_unwritten already updates the inode size, so this should be an "else if" > + error = xfs_setfilesize_trans_alloc(ioend); > + if (error) > + } > + > + if (ioend->io_append_trans) { > error = xfs_setfilesize(ioend); > } else { > ASSERT(!xfs_ioend_is_append(ioend)); > } Does it really make sense to have different allocation points for buffered vs direct I/O? At least it needs a comment explaining why it's done differently. While it's probably too much work for a quick fix I'd much rather replace the hacks we currently do in the direct I/O code with a scheme where the dio code has a hook to allocate the transaction if needed at the right point. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs