On 02/05, Christoph Hellwig wrote: > On Mon, Jan 16, 2017 at 09:32:20AM -0800, Christoph Hellwig wrote: > > On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote: > > > Previously, I've done to issue discard bios asynchronously. But the problem that > > > I've got is that was not enough. When testing nvme SSD with noop IO scheduler, > > > submit_bio() was blocked at every 8 async discard bios, resulting in very slow > > > checkpoint process which blocks most of other FS operations. > > > > Where does it block? Are you running out of request? What driver is > > this on top of? > > Ping? I'm currently spending a lot of effort on fs and block dіscard > code, and I'd like to make sure we get common infrastructure instead > of local hacks. Sorry for the late response due to the travel. When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose model name is SSDPE2MW012T4, I've got the following trace. ... fstrim-12620 [000] .... 334572.907534: f2fs_issue_discard: dev = (259,1), blkstart = 0x902900, blklen = 0x400 fstrim-12620 [000] .... 334572.907535: block_bio_remap: 259,0 D 75583488 + 8192 <- (259,1) 75581440 fstrim-12620 [000] .... 334572.907535: block_bio_queue: 259,0 D 75583488 + 8192 [fstrim] fstrim-12620 [000] .... 334572.907535: block_getrq: 259,0 D 75583488 + 8192 [fstrim] fstrim-12620 [000] .... 334572.907536: block_unplug: [fstrim] 1 fstrim-12620 [000] .... 334572.907536: block_rq_insert: 259,0 D 0 () 75583488 + 8192 [fstrim] fstrim-12620 [000] .... 334572.907536: block_rq_issue: 259,0 D 0 () 75583488 + 8192 [fstrim] < repeat 6 times > fstrim-12620 [000] .... 334572.907620: f2fs_issue_discard: dev = (259,1), blkstart = 0x904500, blklen = 0x400 fstrim-12620 [000] .... 334572.907620: block_bio_remap: 259,0 D 75640832 + 8192 <- (259,1) 75638784 fstrim-12620 [000] .... 334572.907620: block_bio_queue: 259,0 D 75640832 + 8192 [fstrim] fstrim-12620 [000] .... 334572.907621: block_getrq: 259,0 D 75640832 + 8192 [fstrim] <idle>-0 [000] d.h. 334572.907723: block_rq_complete: 259,0 D () 67260416 + 8192 [0] <idle>-0 [000] d.h. 334572.907942: block_rq_complete: 259,0 D () 67268608 + 8192 [0] <idle>-0 [000] d.h. 334572.908155: block_rq_complete: 259,0 D () 67276800 + 8192 [0] <idle>-0 [000] d.h. 334572.908374: block_rq_complete: 259,0 D () 67284992 + 8192 [0] <idle>-0 [000] d.h. 334572.908597: block_rq_complete: 259,0 D () 67293184 + 8192 [0] <idle>-0 [000] d.h. 334572.908823: block_rq_complete: 259,0 D () 67301376 + 8192 [0] <idle>-0 [000] d.h. 334572.909033: block_rq_complete: 259,0 D () 67309568 + 8192 [0] <idle>-0 [000] d.h. 334572.909216: block_rq_complete: 259,0 D () 67317760 + 8192 [0] fstrim-12620 [000] .... 334572.909222: block_unplug: [fstrim] 1 fstrim-12620 [000] .... 334572.909223: block_rq_insert: 259,0 D 0 () 75640832 + 8192 [fstrim] fstrim-12620 [000] .... 334572.909224: block_rq_issue: 259,0 D 0 () 75640832 + 8192 [fstrim] fstrim-12620 [000] .... 334572.909240: f2fs_issue_discard: dev = (259,1), blkstart = 0x904900, blklen = 0x400 fstrim-12620 [000] .... 334572.909241: block_bio_remap: 259,0 D 75649024 + 8192 <- (259,1) 75646976 fstrim-12620 [000] .... 334572.909241: block_bio_queue: 259,0 D 75649024 + 8192 [fstrim] fstrim-12620 [000] .... 334572.909241: block_getrq: 259,0 D 75649024 + 8192 [fstrim] fstrim-12620 [000] .... 334572.909242: block_unplug: [fstrim] 1 fstrim-12620 [000] .... 334572.909242: block_rq_insert: 259,0 D 0 () 75649024 + 8192 [fstrim] fstrim-12620 [000] .... 334572.909242: block_rq_issue: 259,0 D 0 () 75649024 + 8192 [fstrim] < repeat > So, I investigated why block_rq_complete() happened in more detail. The root-caused call path looks like: - submit_bio - generic_make_request - q->make_request_fn - blk_mq_make_request - blk_mq_map_request - blk_mq_alloc_request - blk_mq_get_tag - __blk_mq_get_tag - bt_get - blk_mq_run_hw_queue - finish_wait --> this waits for pending 8 discard bios! It seems the problem comes from the storage processing discard commands too slowly comparing to normal read/write IOs. Any thoughts? Thanks,