在 2008-08-20三的 17:42 -0600,Andreas Dilger写道: > On Aug 20, 2008 16:22 -0700, Mingming Cao wrote: > > ext4: fall back to non delalloc mode if filesystem is almost full > > From: Mingming Cao <cmm@xxxxxxxxxx> > > > > In the case of filesystem is close to full (free blocks is below > > the watermark NRCPUS *4) and there is not enough to reserve blocks for > > delayed allocation, instead of return user back with ENOSPC error, with > > this patch, it tries to fall back to non delayed allocation mode. > > I don't think that making a low watermark of only 4 blocks is enough, > because each of the per-CPU counters could be off by as much as FBC_BATCH. > I think dropping delalloc support earlier is safer, something like > (FBC_BATCH * NR_CPUS). > Okay, make sense. > > +static int ext4_write_begin_nondelalloc(struct file *file, > > + struct address_space *mapping, > > + loff_t pos, unsigned len, unsigned flags, > > + struct page **pagep, void **fsdata) > > +{ > > + struct inode *inode = mapping->host; > > + > > + /* turn off delalloc for this inode*/ > > + ext4_set_aops(inode, 0); > > + > > + return mapping->a_ops->write_begin(file, mapping, pos, len, > > + flags, pagep, fsdata); > > +} > > Hmm, I don't understand this - isn't delalloc already off here, because > this is "ext4_write_begin_nondelalloc()"? > This function probably should be called ext4_wb_fall_back_to_nondelalloc(). it is called when we detect ENOSPC and trying to fall back to non delalloc. This function eventually will call nondelalloc write_begin function ext4_write_begin(). > > +void ext4_set_aops(struct inode *inode, int delalloc) > > { > > + if (test_opt(inode->i_sb, DELALLOC)) { > > + if (ext4_has_free_blocks(EXT4_SB(inode->i_sb), > > + EXT4_MIN_FREE_BLOCKS) > EXT4_MIN_FREE_BLOCKS) > > + delalloc = 0; > > + > > + if (delalloc) { > > + inode->i_mapping->a_ops = &ext4_da_aops; > > + return; > > + } else > > + printk(KERN_INFO "filesystem is close to full, " > > + "delayed allocation is turned off for " > > + " inode %lu\n", inode->i_ino); > > + } > > Also, if you are doing this by changing the aops on the inode, isn't > it possible that a large write starts outside the EXT4_MIN_FREE_BLOCKS > boundary and then still runs out of space without changing the aops? > > Instead it is maybe better to do the check at the start of > ext4_da_write_begin() and if it fails then call the non-delalloc > write_begin from there? > Yeah that's better. But I realize a problem. Actually now I think we can't fall back to nondelalloc mode if the inode has any dirty pages in the page cache, as those pages need delalloc aops ->ext4_da_writepages() to handle delayed allocation writeout.. > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html