On Fri 16-10-15 11:08:46, Nikolay Borisov wrote: > On 10/13/2015 04:14 PM, Jan Kara wrote: > >> Turns out in my original email the bh->b_size was shown : > >> b_size = 0x400 == 1k. So the filesystem is not 4k but 1k. > > > > OK, then I have a theory. We can manipulate bh->b_state in a non-atomic > > manner in _ext4_get_block(). If we happen to do that on the first buffer in > > a page while IO completes on another buffer in the same page, we could in > > theory mess up and miss clearing of BH_Uptodate_Lock flag. Can you try > > whether the attached patch fixes your problem? > > I just want to make sure I have fully understood the problem. So the way > I understand what you have said is that since the blocksize is 1k this > potentially means we might need up to 4 buffer heads to map everything > in a page. But as you mentioned the locking in ext4_finish_bio is done > only on the first buffer_head in this set of bh. At the same time in > _ext4_get_block bh->b_state is modified non-atomically as you said, and > this can happen in one of those 4bh used to map the whole page? Correct? > > If my reasoning is correct then I see no reason why this corruption > couldn't happen even with blocksize == pagesize since ext4_get_block's > non-atomic modification of the bh->b_state could still race with > ext4_finish_bio's atomic modification (even though ext4_finish_bio would > just work on a single buffer head)? Well, the reason why this race is not possible with 4k blocks is that we map blocks in a page (i.e., call _ext4_get_block()) only under page lock and only when the buffer isn't already mapped. And in such case IO completion cannot happen (for IO to run buffer must be mapped). So it's mostly a pure luck but the race is not possible... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html