On 5/14/19 2:34 PM, Coly Li wrote: > On 2019/5/14 5:19 下午, Coly Li wrote: >> On 2019/5/14 4:55 下午, Thorsten Knabe wrote: >>> On 5/13/19 5:36 PM, Coly Li wrote: >>>> On 2019/5/9 3:43 上午, Coly Li wrote: >>>>> On 2019/5/8 11:58 下午, Thorsten Knabe wrote: >>>> [snipped] >>>> >>>>>> Hi Cody. >>>>>> >>>>>>> I cannot do this. Because this is real I/O issued to backing device, if >>>>>>> it failed, it means something really wrong on backing device. >>>>>> >>>>>> I have not found a definitive answer or documentation what the >>>>>> REQ_RAHEAD flag is actually used for. However in my understanding, after >>>>>> reading a lot of kernel source, it is used as an indication, that the >>>>>> bio read request is unimportant for proper operation and may be failed >>>>>> by the block device driver returning BLK_STS_IOERR, if it is too >>>>>> expensive or requires too many additional resources. >>>>>> >>>>>> At least the BTRFS and DRBD code do not take bio request IO errors that >>>>>> are marked with the REQ_RAHEAD flag into account in their error >>>>>> counters. Thus it is probably okay if such IO errors with the REQ_RAHEAD >>>>>> flags set are not counted as errors by bcache too. >>>>>> >>>>>>> >>>>>>> Hmm, If raid6 may returns different error code in bio->bi_status, then >>>>>>> we can identify this is a failure caused by raid degrade, not a read >>>>>>> hardware or link failure. But now I am not familiar with raid456 code, >>>>>>> no idea how to change the md raid code (I assume you meant md raid6)... >>>>>> >>>>>> I my assumptions above regarding the REQ_RAHEAD flag are correct, then >>>>>> the RAID code is correct, because restoring data from the parity >>>>>> information is a relatively expensive operation for read-ahead data, >>>>>> that is possibly never actually needed. >>>>> >>>>> >>>>> Hi Thorsten, >>>>> >>>>> Thank you for the informative hint. I agree with your idea, it seems >>>>> ignoring I/O error of REQ_RAHEAD bios does not hurt. Let me think how to >>>>> fix it by your suggestion. >>>>> >>>> >>>> Hi Thorsten, >>>> >>>> Could you please to test the attached patch ? >>>> Thanks in advance. >>>> >>> >>> Hi Cody. >>> >>> I have applied your patch to a 3 systems running Linux 5.1.1 yesterday >>> evening, on one of them I removed a disk from the RAID6 array. >>> >>> The patch works as expected. The system with the removed disk has logged >>> more than 1300 of the messages added by your patch. Most of them have >>> been logged shortly after boot up and a few shorter burst evenly spread >>> over the runtime of the system. >>> >>> Probably it would be a good idea to apply some sort of rate limit to the >>> log message. I could imagine that a different file system or I/O pattern >>> could cause a lot more of these message. >>> >> >> Hi Thorsten, >> >> Nice suggestion, I will add ratelimit to pr_XXX routines in other patch. >> Will post it out later for your testing. >> > > Could you please to test the attached v2 patch ? Thanks in advance. Hi Cody. Patch works as expected, but with much less log messages (~280) than before. Thank you Thorsten -- ___ | | / E-Mail: linux@xxxxxxxxxxxxxxxxx |horsten |/\nabe WWW: http://linux.thorsten-knabe.de