Hi, > >>>Original commit code assumes, that when a buffer on BJ_SyncData list is > >>>locked, > >>>it is being written to disk. But this is not true and hence it can lead > >>>to a > >>>potential data loss on crash. Also the code didn't count with the fact > >>>that > >>>journal_dirty_data() can steal buffers from committing transaction and > >>>hence > >>>could write buffers that no longer belong to the committing transaction. > >>>Finally it could possibly happen that we tried writing out one buffer > >>>several > >>>times. > >>> > >>>The patch below tries to solve these problems by a complete rewrite of > >>>the data > >>>commit code. We go through buffers on t_sync_datalist, lock buffers > >>>needing > >>>write out and store them in an array. Buffers are also immediately > >>>refiled to > >>>BJ_Locked list or unfiled (if the write out is completed). When the > >>>array is > >>>full or we have to block on buffer lock, we submit all accumulated > >>>buffers for > >>>IO. > >>> > >>>Signed-off-by: Jan Kara <jack@xxxxxxx> > >>> > >>> > >>> > >>I have been running 4+ hours with this patch and seems to work fine. I > >>haven't hit any > >>assert yet :) > >> > >>I will let it run till tomorrow. I will let you know, how it goes. > >> > > Great, thanks. BTW: Do you have any performance tests handy? The > >changes are big enough to cause some unexpected performance regressions, > >livelocks... If you don't have anything ready, I can setup and run > >something myself. Just that I don't like this testing too much ;). > > > Tests are still running fine. > > I don't have any performance tests handy. We have some automated tests I > can schedule to run to verify the stability aspects. OK. I've run IOZONE rewrite throughput test on my computer with iozone -t 10 -i 0 -s 10M -e 2.6.18-rc6 and the same kernel + my patch seem to give almost the same results. The strange thing was that both in vanilla and patched kernel there were several runs where a write througput (when iozone was creating the file) was suddenly 10% of the usual value (18MB/s vs. 2MB/s). The rewrite numbers were always fine. Maybe that has something to do with block allocation code. Anyway, it is not a regression of my patch so unless your test finds some problem I think the patch should be ready for inclusion into -mm... Honza -- Jan Kara <jack@xxxxxxx> SuSE CR Labs - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html