On 10/19/16, 4:58 AM, "Christoph Hellwig" <hch@xxxxxx> wrote: >On Tue, Oct 18, 2016 at 10:29:08AM +1100, Dave Chinner wrote: >> > + if (args.fsbno == NULLFSBLOCK && trydiscard) { >> > + trydiscard = false; >> > + flush_workqueue(xfs_discard_wq); >> > + goto retry; >> > + } >> >> So this is the new behaviour that triggers flushing of the discard >> list rather than having it occur from a log force inside >> xfs_extent_busy_update_extent(). >> >> However, xfs_extent_busy_update_extent() also has backoff when it >> finds an extent on the busy list being discarded, which means it >> could spin waiting for the discard work to complete. >> >> Wouldn't it be better to trigger this workqueue flush in >> xfs_extent_busy_update_extent() in both these cases so that the >> behaviour remains the same for userdata allocations hitting >> uncommitted busy extents, but also allow us to remove the spinning >> for allocations where the busy extent is currently being discarded? > >So the current xfs_extent_busy_update_extent busy wait is something we >actually never hit at all - it's only hit when an extent under discard >is reused by an AGFL allocation, which basically does not happen. > >I'm not feeling very eager to touch that corner case code, and would >rather leave it as-is. > >The new flush deals with the case where we weren't able to find any space >due to the discard list. To honest I almost don't manage to trigger it >anymore once I found the issue fixed in patch 1. It might be possible >to even drop this retry entirely now. > >> This creates one long bio chain with all the regions to discard on >> it, and then when all it completes we call xlog_discard_endio() to >> release all the busy extents. >> >> Why not pull the busy extent from the list and attach it to each >> bio returned and submit them individually and run per-busy extent >> completions? That will substantially reduce the latency of discard >> completions when there are long lists of extents to discard.... > >Because that would defeat the merging I currently do, which is >very effectice. It would also increase the size of the busy extent >structure as it would grow a work_struct, and increase lock contention >in the completion handler. All in all not that pretty, especially >as the most common number of discards are one digit or small two >digit. And this is just going to further decrease once I finish >up my block layer patches to allow multi-range discards by merging >multiple discard bios into a single request. With that even double >digit numbers of discards are fairly rare. > >Now if we eventually want to split the completions I think we'll >need to start merging the extent_busy structures once they are added >to the CIL. That's quite a bit of effort and I'd like to avoid it >for now. Doesn't the block layer already do a reasonable job of merging adjacent discards? This is about the only bio-level optimization that blk-mq does but it should be working. Also last I looked the md layer of the software raid stack could re-dice these into many stripe sized pieces anyway and that also needed to be fixed. Michael -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html