On Tue, 2006-11-28 at 17:40 +0000, Hugh Dickins wrote: > After several days of testing ext2 with reservations, it got caught inside > ext2_try_to_allocate_with_rsv: alloc_new_reservation repeatedly succeeding > on the window [12cff,12d0e], ext2_try_to_allocate repeatedly failing to > find the free block guaranteed to be included (unless there's contention). > Hmm, I suspect there is other issue: alloc_new_reservation should not repeatedly allocating the same window, if ext2_try_to_allocate repeatedly fails to find a free block in that window. find_next_reservable_window() takes my_rsv (the old window that he thinks there is no free block) as a guide to find a window "after" the end block of my_rsv, so how could this happen? > Fix the range to find_next_usable_block's memscan: the scan from "here" > (0xcfe) up to (but excluding) "maxblocks" (0xd0e) needs to scan 3 bytes > not 2 (the relevant bytes of bitmap in this case being f7 df ff - none > 00, but the premature cutoff implying that the last was found 00). > alloc_new_reservation() reserved a window with free block, when come to the time to claim it, it scans the window again. So it seems that the range of the the scan is too small: p = ((char *)bh->b_data) + (here >> 3); r = memscan(p, 0, (maxblocks - here + 7) >> 3); next = (r - ((char *)bh->b_data)) << 3; ---------------------> next is -1 if (next < maxblocks && next >= here) return next; ----------------------> falls to false branch here = bitmap_search_next_usable_block(here, bh, maxblocks); return here; So we failed to find a free byte in the range. That's seems fine to me. It's only a nice thing to have -- try to allocate a block in a place where it's neighbors are all free also. If it fails, it will search the window bit by bit. So I don't understand why it is not being recovered by bitmap_search_next_usable_block(), which test the bitmap bit by bit? > Is this a problem for mainline ext2? No, because the "size" in its memscan > is always EXT2_BLOCKS_PER_GROUP(sb), which mkfs.ext2 requires to be a > multiple of 8. Is this a problem for ext3 or ext4? No, because they have > an additional extN_test_allocatable test which rescues them from the error. > Hmm, if the error is it prematurely think there is no free block in the range (bitmap on disk), then even in ext3/4, it will not bother checking the jbd copy of the bitmap. I am not sure this is the cause that ext3/4 may not has the problem. > But the bigger question is, why does the my_rsv case come here to > find_next_usable_block at all? Because grp_goal is -1? > Doesn't its 64-bit boundary limit, and its > memscan, blithely ignore what the reservation prepared? I agree with you that the double check is urgly. But it's necessary:( If there to prevent contention: other file make steal that free block we reserved for this file, in the case filesystem is full of reservation... > It's messy too, > the complement of the memscan being that "i < 7" loop over in > ext2_try_to_allocate. I think this ought to be cleaned up, > in ext2+reservations and ext3 and ext4. > The "i<7" loop there is for non reservation case. Since find_next_usable_block() could find a free byte, it's trying to avoid filesystem holes by shifting the start of the free block for at most 7 times. Thanks! Mingming > fs/ext2/balloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- 2.6.19-rc6-mm2/fs/ext2/balloc.c 2006-11-24 08:18:02.000000000 +0000 > +++ linux/fs/ext2/balloc.c 2006-11-27 19:28:41.000000000 +0000 > @@ -570,7 +570,7 @@ find_next_usable_block(int start, struct > here = 0; > > p = ((char *)bh->b_data) + (here >> 3); > - r = memscan(p, 0, (maxblocks - here + 7) >> 3); > + r = memscan(p, 0, ((maxblocks + 7) >> 3) - (here >> 3)); > next = (r - ((char *)bh->b_data)) << 3; > > if (next < maxblocks && next >= here) > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html