On Wed 19-02-20 18:42:36, Yang Xu wrote: > on 2020/02/19 18:09, Yang Xu wrote: > > > > > 1) If you mount ext4 with barrier=0 mount option, does the > > > > > problem go away? > > > > Yes. Use barrier=0, this case doesn't hang, > > > > > > OK, so there's some problem with how the block layer is handling flush > > > bios... > > > > > > > > 2) Can you run the test and at the same time run 'blktrace > > > > > -d /dev/sdc' to > > > > > gather traces? Once the machine is stuck, abort blktrace, process the > > > > > resulting files with 'blkparse -i sdc' and send here > > > > > compressed blkparse > > > > > output. We should be able to see what was happening with the > > > > > stuck request > > > > > in the trace and maybe that will tell us something. > > > > The log size is too big(58M) and our emali limit is 5M. > > > > > > OK, can you put the log somewhere for download? Alternatively you could > > > provide only last say 20s of the trace which should hopefully fit > > > into the > > > limit... > > Ok. I will use split command and send you in private to avoid much noise. > log as attach. Thanks for the log. So the reason for the hang is clearly visible at the end of the log: 8,32 2 104324 164.814457402 995 Q FWS [fsstress] 8,32 2 104325 164.814458088 995 G FWS [fsstress] 8,32 2 104326 164.814460957 739 D FN [kworker/2:1H] This means, fsstress command has queued cache flush request (from blkdev_issue_flush()), this has been dispatched to the driver ('D' event) but it has never been completed by the driver and so blkdev_issue_flush() never returns. To debug this further, you probably need to start looking into what happens with the request inside QEMU. There's not much I can help you with at this point since I'm not an expert there. Do you use image file as a backing store or a raw partition? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR