On Thu, Feb 20, 2020 at 08:03:52PM -0800, Bart Van Assche wrote: > On 2020-01-08 22:21, Ming Lei wrote: > > delete_partition() clears the cached last_lookup partition. However > > the .last_lookup cache may be overwritten by one IO path after > > it is cleared from delete_partition(). Then another IO path may > > use the cached deleting partition after __delete_partition() is > > called, then use-after-free is triggered on the cached partition. > > > > Fixes the issue by the following approach: > > > > 1) always get the partition's refcount via hd_struct_try_get() before > > setting .last_lookup > > > > 2) move clearing .last_lookup from delete_partition() to > > __delete_partition() which is release handle of the partition's > > percpu-refcount, so that no IO path can overwrite .last_lookup after it > > is cleared in __delete_partition(). > > > > It is one candidate approach of Yufen's patch[1] which adds overhead > > in fast path by indirect lookup which may introduce one extra cacheline > > in IO path. Also this patch relies on percpu-refcount's protection, and > > it is easier to understand and verify. > > > > [1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#t > > Hi Ming, > > disk_map_sector_rcu() is called from the I/O path only and hence with > q->q_usage_counter > 0. Has it been considered to freeze disk->queue > from delete_partition() before deleting a partition and unfreezing > disk->queue after partition deletion has finished? Would that approach > allow to eliminate partition reference counting and thereby improve the > performance of the hot path? Hi Bart, I did consider that approach, but this way may cause performance regression, given deleting any partition drops IO performance a lot on other un-related partitions. Thanks, Ming