On 09/21/2012 02:56 PM, Dmitry Monakhov wrote: > On Fri, 21 Sep 2012 14:20:12 +0200, Jens Axboe <axboe@xxxxxxxxx> wrote: >> On 09/21/2012 02:13 PM, Dmitry Monakhov wrote: >>> On Fri, 21 Sep 2012 14:00:18 +0200, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> On 09/21/2012 01:42 PM, Dmitry Monakhov wrote: >>>>> On Fri, 21 Sep 2012 13:25:37 +0200, Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>> On 09/21/2012 01:04 PM, Dmitry Monakhov wrote: >>>>>>> As soon as i understand this is just a mistype. >>>>>> >>>>>> It's not a typo. By that logic, EILSEQ is fatal too, since it is a >>>>>> verification failure of read data (so might as well have been an EIO). >>>>>> Fatal, in this context, means errors that fio can recover from and >>>>>> continue doing work. >>>>> Ohh i ment to say that both errors are fatal, but function called >>>> >>>> And I'm saying that NEITHER of them are fatal. >>>> >>>>> td_NON_fatal_error, and it result true in case of EIO or EILSEQ >>>>> this result continue_on_error logic broken because >>>>> io_u.c 1440: >>>>> if (icd->error && td_non_fatal_error(icd->error) && >>>>> (td->o.continue_on_error & td_error_type(io_u->ddir, >>>>> icd->error))) { >>>> >>>> Right, so if error and error is non-fatal, we continue on that error >>>> unless told otherwise. It is logged and we continue on our business. >>> Please dint get me wrong .... but please take a look more carefully >>> >>> Original code: ((e) == EIO || (e) == EILSEQ) >>> True for fatal errors, and false for non fatal ones >>> But function called td_NON_fatal_error() >>> And it should result opposite result >>> >>> so my code: (!((e) == EIO || (e) == EILSEQ)) is equivalent of >>> (err != EIO) && (err != EILSEQ) >> >> You keep not reading my point. EIO and EILSEQ are are not fatal errors!! >> These are "expected" in the sense that we know what conditions trigger >> them. > Ok i've finally get the point. But i'm disagree with terms > beacuse most filesystems and applications interpret EIO as fatal > error. Once device return EIO to filesystem it will fall back to RO mode > or just panic. I heard about some RAID oriented HDD which tend to return > EIO ASAP so raid controller may remap bio to another drive, but this is > very special case and such devices works only with raid controller. > From my point of view non fatal error are: ENOSPC, EBUSY, EAGAIN, ENOMEM Depends on your point of view. If it's a write workload, ENOSPC probably means "we are done, don't bother writing again". The fatal here is just whether fio can continue safely or not. Running a job past various EIO or verify failures is a very valid use case, instead of just terminating on the first EIO seen. > Nor than less it would be reasonable to make fatal error list > configurable. I'll prepare a patch sortly. That'd be fine indeed. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html