On 2019/5/15 3:32 下午, Thorsten Knabe wrote: > On 5/14/19 2:34 PM, Coly Li wrote: >> On 2019/5/14 5:19 下午, Coly Li wrote: >>> On 2019/5/14 4:55 下午, Thorsten Knabe wrote: >>>> On 5/13/19 5:36 PM, Coly Li wrote: >>>>> On 2019/5/9 3:43 上午, Coly Li wrote: >>>>>> On 2019/5/8 11:58 下午, Thorsten Knabe wrote: >>>>> [snipped] >>>>> >>>>>>> Hi Cody. >>>>>>> >>>>>>>> I cannot do this. Because this is real I/O issued to backing device, if >>>>>>>> it failed, it means something really wrong on backing device. >>>>>>> >>>>>>> I have not found a definitive answer or documentation what the >>>>>>> REQ_RAHEAD flag is actually used for. However in my understanding, after >>>>>>> reading a lot of kernel source, it is used as an indication, that the >>>>>>> bio read request is unimportant for proper operation and may be failed >>>>>>> by the block device driver returning BLK_STS_IOERR, if it is too >>>>>>> expensive or requires too many additional resources. >>>>>>> >>>>>>> At least the BTRFS and DRBD code do not take bio request IO errors that >>>>>>> are marked with the REQ_RAHEAD flag into account in their error >>>>>>> counters. Thus it is probably okay if such IO errors with the REQ_RAHEAD >>>>>>> flags set are not counted as errors by bcache too. >>>>>>> >>>>>>>> >>>>>>>> Hmm, If raid6 may returns different error code in bio->bi_status, then >>>>>>>> we can identify this is a failure caused by raid degrade, not a read >>>>>>>> hardware or link failure. But now I am not familiar with raid456 code, >>>>>>>> no idea how to change the md raid code (I assume you meant md raid6)... >>>>>>> >>>>>>> I my assumptions above regarding the REQ_RAHEAD flag are correct, then >>>>>>> the RAID code is correct, because restoring data from the parity >>>>>>> information is a relatively expensive operation for read-ahead data, >>>>>>> that is possibly never actually needed. >>>>>> >>>>>> >>>>>> Hi Thorsten, >>>>>> >>>>>> Thank you for the informative hint. I agree with your idea, it seems >>>>>> ignoring I/O error of REQ_RAHEAD bios does not hurt. Let me think how to >>>>>> fix it by your suggestion. >>>>>> >>>>> >>>>> Hi Thorsten, >>>>> >>>>> Could you please to test the attached patch ? >>>>> Thanks in advance. >>>>> >>>> >>>> Hi Cody. >>>> >>>> I have applied your patch to a 3 systems running Linux 5.1.1 yesterday >>>> evening, on one of them I removed a disk from the RAID6 array. >>>> >>>> The patch works as expected. The system with the removed disk has logged >>>> more than 1300 of the messages added by your patch. Most of them have >>>> been logged shortly after boot up and a few shorter burst evenly spread >>>> over the runtime of the system. >>>> >>>> Probably it would be a good idea to apply some sort of rate limit to the >>>> log message. I could imagine that a different file system or I/O pattern >>>> could cause a lot more of these message. >>>> >>> >>> Hi Thorsten, >>> >>> Nice suggestion, I will add ratelimit to pr_XXX routines in other patch. >>> Will post it out later for your testing. >>> >> >> Could you please to test the attached v2 patch ? Thanks in advance. > > Hi Cody. > > Patch works as expected, but with much less log messages (~280) than before. May I add a tested-by: tag with your name and email address ? -- Coly Li