Re: ENOSPC returned during writepages

"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> · Thu, 21 Aug 2008 20:48:15 +0530

On Wed, Aug 20, 2008 at 02:55:25PM -0700, Mingming Cao wrote:
> 
> 在 2008-08-20三的 07:53 -0400，Theodore Tso写道：
> > On Wed, Aug 20, 2008 at 04:16:44PM +0530, Aneesh Kumar K.V wrote:
> > > > mpage_da_map_blocks block allocation failed for inode 323784 at logical
> > > > offset 313 with max blocks 11 with error -28
> > > > This should not happen.!! Data will be lost
> > 
> > We don't actually lose the data if free blocks are subsequently made
> > available, correct?
> > 
> 
> Well, I thought with Aneesh's new ext4_da_invalidate patch  in the patch
> queue, the dirty page get invalidate if ext4_da_writepages() could not
> successfully map/allocate blocks. That means we lost data:( 
> 
> I have a feeling that we did not try very hard before invalidate the
> dirty page which fail to map to disks. Perhaps we should try a few more
> times before give up. Also in that case, perhaps we should turn off
> delalloc fs wide, so the new writers won't take the subsequently made
> avaible free blocks away from this unlucky delalloc da writepages.

How do we try hard ? The mballoc already try had to allocate blocks. So I
am not sure what do we achieve by requesting for block allocation again.

> 
> > > I tried this patch. There are still multiple ways we can get wrong free
> > > block count. The patch reduced the number of errors. So we are doing
> > > better with patch. But I guess we can't use the percpu_counter based
> > > free block accounting with delalloc. Without delalloc it is ok even if
> > > we find some wrong free blocks count . The actual block allocation will fail in
> > > that case and we handle it perfectly fine. With delalloc we cannot
> > > afford to fail the block allocation. Should we look at a free block
> > > accounting rewrite using simple ext4_fsblk_t and and a spin lock ?
> > 
> > It would be a shame if we did given that the whole point of the percpu
> > counter was to avoid a scalability bottleneck.  Perhaps we could take
> > a filesystem-level spinlock only when the number of free blocks as
> > reported by the percpu_counter falls below some critical level?
> 
> Perhaps the  thresh hold should b higher, but other than that, the
> current ext4_has_free_blocks() code, does 1) get the freeblocks counter
> 2) if the counter < FBC_BATCH , it will call
> percpu_counter_sum_and_set(), which will take the per-cpu-counter lock,
> and do accurate accounting.
> 
> So after think again, I could not see what suggested above diffrent from
> what current ext4_has_free_blocks() does?
> 
> 
> Right now the ext4_has_free_blocks() uses the 
> 
> #define FBC_BATCH       (NR_CPUS*4)
> 
> as the thresh hold.  I thought that was good enough as
> ext4_da_reserve_space() only request 1 block at a time (called at
> write_begin time), but maybe I am wrong...
> 

I have right now threshold check as below.

+       /* Each CPU can accumulate FBC_BATCH blocks in their local
+        * counters. So we need to make sure we have free blocks more
+        * than FBC_BATCH  * nr_cpu_ids. Also add a window of 4 times.
+        */
+       if (free_blocks - (nblocks + root_blocks) <
+                                       (4 * (FBC_BATCH * nr_cpu_ids)))
{

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html