Hi Ted, Thank you for comment. :-) Theodore Tso wrote: >> + >> + up_write(&EXT4_I(org_inode)->i_data_sem); >> + ret = a_ops->write_begin(o_filp, mapping, offs, data_len, w_flags, >> + &page, &fsdata); >> + down_write(&EXT4_I(org_inode)->i_data_sem); >> > > This is going to be a problem. Once we release i_data_sem, there is > the possibility that other processes which might be running and > accessing the file at the same time that the defragger is running > could be blocked waiting for i_data_sem to be released. The moment it > gets released, they will grab the lock then start to modify extent > tree --- either allocating new blocks to it, or worse, truncating or > unlinking the target inode. > > This is going to be a mess to fix, since Linux doesn't have recursive > locking primitives. We do take i_mutex, which will protect us against > truncates, but it won't protect against a write() system call. Also, > if there inode has delayed allocation blocks pending, those could get > written out by the page cleaner, and i_mutex won't protect us against > changes to i_data_sem, either. > > As you said, we take i_mutex at the start of ext4_defrag() and hold it until the end of this function, so orig file is protected against truncates and it never be shrunk during defrag. On the other hand, semaphore is released/taken around write_begin() in ext4_defrag_partial() every page, so it does not protect orig file against a write() system call from other process. So that defrag result (fragmentation) might not be best, but data corruption does not occur at least. So I think it is not a serious problem. As above, it is not necessary to lock the whole of ext4_defrag() with semaphore, we should just lock only a necessary point. Therefore defrag V1's lock seems have unneeded lock points. I will change lock point and semaphore type in the next version. Do I overlook something? Regards, Akira Fujita > We could add special-case kludgery to wrap around all of the places > that takes or release the i_data_sem so that we get recursive locking > support --- but that would be very ugly indeed. > > I'm not sure what's the best way to deal with this; maybe if we sleep > on it someone will come up with a better suggestion --- but it's > something that we have to figure out. > > - Ted > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html