On Sat, 30 Jul 2011, Ted Ts'o wrote: > On Sat, Jul 30, 2011 at 07:36:12PM +0300, Fyodor Ustinov wrote: > > As it is written in subject - 3.0.0 release. > > > > It's Ubuntu 11.04 with custom kernel > > Right, sorry, I missed that. And just to be clear this wasn't an -rc > kernel but 3.0 final, right? > > Hmm, looking through recent commits which will shortly be merged into > 3.1, this one leaps out, but I'm not sure it's the cause --- how full > was your disk at the end of this exercise? > > I haven't looked at Ceph in quite a while. As I recall it was > primarily doing Direct I/O writes, correct? Or does it use buffered > I/O? And does it use the new "punch" ioctl to release blocks from the > middle of a file? Ext4 added punch support in 3.0, and there are some > bug fixes that are going into 3.1, but I don't think there were any > that would lead to the failure mode you are seeing. Direct-io is used for the osd journal only; is that on the ext4 partition, Fyodor? Everything else is buffered io. We don't use the new punch ioctl. We do use xattrs extensively, though; that was the last extN bug we uncovered. That's where my money is. Fyodor, if you set 'debug filestore = 10' you'll get a log of every operation on the fs in the osd log. (Or close to it; there may be a few that we missed, but to a first approximation at least it'll describe the workload pretty well.) sage (BTW we'll be really happy if/when the large xattr patches from the Lustre guys make it into mainline! The (4k?) limit on total xattrs is a problem for us.) > > - Ted > > commit 7132de744ba76930d13033061018ddd7e3e8cd91 > Author: Maxim Patlasov <maxim.patlasov@xxxxxxxxx> > Date: Sun Jul 10 19:37:48 2011 -0400 > > ext4: fix i_blocks/quota accounting when extent insertion fails > > The current implementation of ext4_free_blocks() always calls > dquot_free_block This looks quite sensible in the most cases: blocks > to be freed are associated with inode and were accounted in quota and > i_blocks some time ago. > > However, there is a case when blocks to free were not accounted by the > time calling ext4_free_blocks() yet: > > 1. delalloc is on, write_begin pre-allocated some space in quota > 2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks() > 3. then ext4_ext_map_blocks() gets an error (e.g. ENOSPC) from > ext4_ext_insert_extent() and calls ext4_free_blocks(). > > In this scenario, ext4_free_blocks() calls dquot_free_block() who, in > turn, decrements i_blocks for blocks which were not accounted yet (due > to delalloc) After clean umount, e2fsck reports something like: > > > Inode 21, i_blocks is 5080, should be 5128. Fix<y>? > because i_blocks was erroneously decremented as explained above. > > The patch fixes the problem by passing the new flag > EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request > that the dquot_free_block() call be skipped. > > Signed-off-by: Maxim Patlasov <maxim.patlasov@xxxxxxxxx> > Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx> > Cc: stable@xxxxxxxxxx > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 49d2cea..d13f3b5 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -526,6 +526,7 @@ struct ext4_new_group_data { > #define EXT4_FREE_BLOCKS_METADATA 0x0001 > #define EXT4_FREE_BLOCKS_FORGET 0x0002 > #define EXT4_FREE_BLOCKS_VALIDATED 0x0004 > +#define EXT4_FREE_BLOCKS_NO_QUOT_UPDATE 0x0008 > > /* > * ioctl commands > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 31ae5fb..a862138 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -3565,12 +3565,14 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, > > err = ext4_ext_insert_extent(handle, inode, path, &newex, flags); > if (err) { > + int fb_flags = flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE ? > + EXT4_FREE_BLOCKS_NO_QUOT_UPDATE : 0; > /* free data blocks we just allocated */ > /* not a good idea to call discard here directly, > * but otherwise we'd need to call it every free() */ > ext4_discard_preallocations(inode); > ext4_free_blocks(handle, inode, NULL, ext4_ext_pblock(&newex), > - ext4_ext_get_actual_len(&newex), 0); > + ext4_ext_get_actual_len(&newex), fb_flags); > goto out2; > } > > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 389386b..1900ec7 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -4637,7 +4637,7 @@ do_more: > } > ext4_mark_super_dirty(sb); > error_return: > - if (freed) > + if (freed && !(flags & EXT4_FREE_BLOCKS_NO_QUOT_UPDATE)) > dquot_free_block(inode, freed); > brelse(bitmap_bh); > ext4_std_error(sb, err); > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html