On Fri, 29 Apr 2016, Ming Lei wrote: > > If some block device driver in a process context finds out that it needs > > to terminate a bio, it calls bio_endio in a process context. Why would it > > need to trigger an interrupt just to call bio_endio? (and how could it > > trigger an interrupt if that driver perhaps doesn't use interrupts at > > all?) I don't know what are you trying to suggest. > > That should be the failure path, and this patch just aims at the normal bio > complete path which is run at 99.999% time. And irq handler often > borrows current process's stack and cause huge stack uses. For some block devices it is common to call bi_endio from process context, for example dm-crypt or dm-verity do it for all read bios. loop driver calls bi_endio from process context on every bio it processes. > >> It isn't easy to avoid the recursive calling in process context except you > >> can add something 'task_struct' or introduce 'alloca()' in kernel. Or do you > >> have better ideas? > > > > preempt_disable around the bi_endio callback should be sufficient. > > How can a sleepable function be run in preempt disabled context? > > Looks you still don't believe the bi_endio scheduled to process context > can sleep, do you? I don't believe bi_endio function can sleep. bi_endio function doesn't know if it will be called from process or interrupt context - so it can't sleep. If bi_endio function slept, bug would happen if it were called from interrupt context. btrfs is a special case where btrfs calls bio_endio on bios that it created on it own (note that this trick only works when the same driver creates a bio and terminates the bio with bio_endio - if some other driver terminated the bio, it wouldn't work because the other driver could terminate the bio from interrupt context). That could be simply fixed by calling bio->bi_end_io or btrfs_endio_direct_read directly or making a new function btrfs_endio. > >> Not mention wait_for_completion() in both __btrfs_correct_data_nocsum() > >> and __btrfs_subio_endio_read(), which are called by btrfs_subio_endio_read() > >> in btrfs_endio_direct_read(). > > > > btrfs is calling bio_endio on bio that it created. So, bio_endio in btrfs > > can be replaced with: > > if (bio->bi_end_io == btrfs_endio_direct_read) > > btrfs_endio_direct_read(bio); > > else > > bio_endio(bio); > > > > ... or just maybe just with this: > > bio->bi_end_io(bio); > > It is really ugly and should be avoided most of times. > > > > > This could be fixed easily. > > Suppose it might be done in this way, and there may be other .bi_end_io > which can sleep too, you need to make sure there isn't any such usage > first before running it with preempt disabled. > > Anyway looks you worry about the seldom-run failure path, which isn't > addressed by this patch. And you can optimize that in future and that isn't > contradictory with this patch. Worrying about seldom-run failrure paths is nothing wrong. These paths are not any less important than the common code paths. Mikulas > Thanks, > Ming -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html