在 2012年2月10日 上午10:55,Ted Ts'o <tytso@xxxxxxx> 写道: > On Thu, Feb 02, 2012 at 03:48:49PM +0800, Robin Dong wrote: >> >> The reason is ext4_has_free_clusters reporting wrong >> result. Actually, the unit of argument "dirty_clusters" is block, so >> we should tranform the sbi->s_dirtyclusters_counter to block , just >> like "free_clusters". > > I've been looking at the the delayed allocation reservation code, and > I think it's a lot more confused that this. Some places we are > playing with fields as if they are clusters, and some times as if they > are blocks, and I've definitely found places where we're not handling > quota correctly with bigalloc (ext4_map_blocks is calling > ext4_da_update_reserve_space() with a value which is clearly in units > of blocks, but we are treating that vale as if it is clusters, etc.) > > So I think the problem is a lot larger than what you pointed out, and > we shouldn't fix it just by throwing in an EXT4_C2B call. If > s_dirtyclusters_counter should be denominated in blocks, then we need > to rename it. And there there's this comment which made me wince: > > * Note that in case of bigalloc, i_reserved_meta_blocks, > * i_reserved_data_blocks, etc. refer to number of clusters. > > Argh, no. If something is denominated in clusters, then it should be > appropriately named as such. > > Actually, I think the problem is a lot bigger than this, and it's a > conceptual one. When we try writing to a block which does not have a > mapping, and bigalloc is enabled, there are three cases: > > 1) This is the first time the cluster has ever been written to; hence, > even though we are dirtying a block, we need to reserve room for the > entire cluster. > > 2) While the block has not been mapped (and so the data on disk is > uninitialized) the cluster has indeed been allocated, so we don't need > to reserve any additional room for the block. > > 3) One or more blocks in the cluster have already been written to via > delayed allocation, but the cluster itself has not been selected > (allocated) yet. But since the space for the entire cluster was > reserved when the first block in the cluster was dirtied (case #1), we > don't need to reserve any more space. > > We aren't handling these cases correctly today, since we don't have > this logic here. These cases don't matter if you're only using > fallocate() and direct I/O, but if you are using buffered writes and > delayed allocation, they obviously matter a huge amount. > > And I don't think we're handling it correctly. Argh... > > - Ted Hi, Ted and Aditya I am trying to fix this "bigalloc quota reservation in delay-allocation" problem recently, and I feel confusion about these code below in ext4_ext_map_blocks(): if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) { unsigned int reserved_clusters; /* * Check how many clusters we had reserved this allocated range */ reserved_clusters = get_reserved_cluster_alloc(inode, map->m_lblk, allocated); if (map->m_flags & EXT4_MAP_FROM_CLUSTER) { if (reserved_clusters) { ...... ext4_da_update_reserve_space(inode, reserved_clusters, 0); } } else { BUG_ON(allocated_clusters < reserved_clusters); /* We will claim quota for all newly allocated blocks.*/ ext4_da_update_reserve_space(inode, allocated_clusters, 1); if (reserved_clusters < allocated_clusters) { struct ext4_inode_info *ei = EXT4_I(inode); int reservation = allocated_clusters - reserved_clusters; dquot_reserve_block(inode, EXT4_C2B(sbi, reservation)); spin_lock(&ei->i_block_reservation_lock); ei->i_reserved_data_blocks += reservation; spin_unlock(&ei->i_block_reservation_lock); } } } if we have allocated 3 clusters, why should we add "reservation" back to i_reservd_data_blocks ? (Aditya Kali 's long comment dose not make sense to me) Maybe we don't need to do ext4_da_update_reserve_space when the new block is allocated from "implied_cluster"? Why not just change them to: if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) { if ( !(map->m_flags & EXT4_MAP_FROM_CLUSTER) ) /* We will claim quota for all newly allocated blocks.*/ ext4_da_update_reserve_space(inode, allocated_clusters, 1); } -- -- Best Regard Robin Dong -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html