On Wed, Aug 20, 2008 at 02:55:25PM -0700, Mingming Cao wrote: > > 在 2008-08-20三的 07:53 -0400,Theodore Tso写道: > > On Wed, Aug 20, 2008 at 04:16:44PM +0530, Aneesh Kumar K.V wrote: > > > > mpage_da_map_blocks block allocation failed for inode 323784 at logical > > > > offset 313 with max blocks 11 with error -28 > > > > This should not happen.!! Data will be lost > > > > We don't actually lose the data if free blocks are subsequently made > > available, correct? > > > > Well, I thought with Aneesh's new ext4_da_invalidate patch in the patch > queue, the dirty page get invalidate if ext4_da_writepages() could not > successfully map/allocate blocks. That means we lost data:( > > I have a feeling that we did not try very hard before invalidate the > dirty page which fail to map to disks. Perhaps we should try a few more > times before give up. Also in that case, perhaps we should turn off > delalloc fs wide, so the new writers won't take the subsequently made > avaible free blocks away from this unlucky delalloc da writepages. How do we try hard ? The mballoc already try had to allocate blocks. So I am not sure what do we achieve by requesting for block allocation again. > > > > I tried this patch. There are still multiple ways we can get wrong free > > > block count. The patch reduced the number of errors. So we are doing > > > better with patch. But I guess we can't use the percpu_counter based > > > free block accounting with delalloc. Without delalloc it is ok even if > > > we find some wrong free blocks count . The actual block allocation will fail in > > > that case and we handle it perfectly fine. With delalloc we cannot > > > afford to fail the block allocation. Should we look at a free block > > > accounting rewrite using simple ext4_fsblk_t and and a spin lock ? > > > > It would be a shame if we did given that the whole point of the percpu > > counter was to avoid a scalability bottleneck. Perhaps we could take > > a filesystem-level spinlock only when the number of free blocks as > > reported by the percpu_counter falls below some critical level? > > Perhaps the thresh hold should b higher, but other than that, the > current ext4_has_free_blocks() code, does 1) get the freeblocks counter > 2) if the counter < FBC_BATCH , it will call > percpu_counter_sum_and_set(), which will take the per-cpu-counter lock, > and do accurate accounting. > > So after think again, I could not see what suggested above diffrent from > what current ext4_has_free_blocks() does? > > > Right now the ext4_has_free_blocks() uses the > > #define FBC_BATCH (NR_CPUS*4) > > as the thresh hold. I thought that was good enough as > ext4_da_reserve_space() only request 1 block at a time (called at > write_begin time), but maybe I am wrong... > I have right now threshold check as below. + /* Each CPU can accumulate FBC_BATCH blocks in their local + * counters. So we need to make sure we have free blocks more + * than FBC_BATCH * nr_cpu_ids. Also add a window of 4 times. + */ + if (free_blocks - (nblocks + root_blocks) < + (4 * (FBC_BATCH * nr_cpu_ids))) { -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html