On Fri, 2019-03-15 at 17:44 +-0800, jianchao.wang wrote: +AD4 On 3/15/19 5:20 PM, Christoph Hellwig wrote: +AD4 +AD4 On Fri, Mar 15, 2019 at 04:57:36PM +-0800, Jianchao Wang wrote: +AD4 +AD4 +AD4 Hi Jens +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 As we know, there is a risk of accesing stale requests when iterate +AD4 +AD4 +AD4 in-flight requests with tags-+AD4-rqs+AFsAXQ and this has been talked in following +AD4 +AD4 +AD4 thread, +AD4 +AD4 +AD4 +AFs-1+AF0 https://urldefense.proofpoint.com/v2/url?u+AD0-https-3A+AF8AXw-marc.info+AF8--3Fl-3Dlinux-2Dscsi-26m-3D154511693912752-26w-3D2+ACY-d+AD0-DwICAg+ACY-c+AD0-RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI+AF8-JnE+ACY-r+AD0-7WdAxUBeiTUTCy8v-7zX +AD4 +AD4 +AD4 yr4qk7sx26ATvfo6QSTvZyQ+ACY-m+AD0-CydqJPTf4FUrfs7ipUc2chm2jGuNuDVn+AF8-onIetKEehM+ACY-s+AD0-ZQ7RfO6-737-t5kQv7SFlXMhIdpwn+AF8-AxJI93d6c-nj0+ACY-e+AD0 +AD4 +AD4 +AD4 +AFs-2+AF0 https://urldefense.proofpoint.com/v2/url?u+AD0-https-3A+AF8AXw-marc.info+AF8--3Fl-3Dlinux-2Dblock-26m-3D154526189023236-26w-3D2+ACY-d+AD0-DwICAg+ACY-c+AD0-RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI+AF8-JnE+ACY-r+AD0-7WdAxUBeiTUTCy8v-7z +AD4 +AD4 +AD4 Xyr4qk7sx26ATvfo6QSTvZyQ+ACY-m+AD0-CydqJPTf4FUrfs7ipUc2chm2jGuNuDVn+AF8-onIetKEehM+ACY-s+AD0-EBV1M5p4mE8jZ5ZD1ecU5kMbJ9EtbpVJoc7Tqolrsc8+ACY-e+AD0 +AD4 +AD4 +AD4 +AD4 I'd rather take one step back and figure out why we are iterating +AD4 +AD4 the busy requests. There really shouldn't be any reason why a driver +AD4 +AD4 is even doings that (vs some error handling helpers in the core +AD4 +AD4 block code that can properly synchronize). +AD4 +AD4 +AD4 +AD4 A typical scene is blk+AF8-mq+AF8-in+AF8-flight, +AD4 +AD4 blk+AF8-mq+AF8-get+AF8-request blk+AF8-mq+AF8-in+AF8-flight +AD4 -+AD4 blk+AF8-mq+AF8-get+AF8-tag -+AD4 blk+AF8-mq+AF8-queue+AF8-tag+AF8-busy+AF8-iter +AD4 -+AD4 bt+AF8-for+AF8-each +AD4 -+AD4 bt+AF8-iter +AD4 -+AD4 rq +AD0 taags-+AD4-rqs+AFsAXQ +AD4 -+AD4 rq-+AD4-q //---+AD4 get a stale request +AD4 -+AD4 blk+AF8-mq+AF8-rq+AF8-ctx+AF8-init +AD4 -+AD4 data-+AD4-hctx-+AD4-tags-+AD4-rqs+AFs-rq-+AD4-tag+AF0 +AD0 rq +AD4 +AD4 This stale request maybe something that has been freed due to io scheduler +AD4 is detached or a q using a shared tagset is gone. +AD4 +AD4 And also the blk+AF8-mq+AF8-timeout+AF8-work could use it to pick up the expired request. +AD4 The driver would also use it to requeue the in-flight requests when the device is dead. +AD4 +AD4 Compared with adding more synchronization, using static+AF8-rqs+AFsAXQ directly maybe simpler :) Hi Jianchao, Although I appreciate your work: I agree with Christoph that we should avoid races like this rather than modifying the block layer to make sure that such races are handled safely. Thanks, Bart.