Thanks. I will try your patch. 2016-03-25 3:39 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>: > I took a look, and it's a regression introduced by this change in 2012: > > commit c04e4661e4da3b6079f415897e4507cf8e610c54 > Author: Daniel Ehrenberg <dehrenberg@xxxxxxxxxx> > Date: Fri Mar 16 18:54:15 2012 +0100 > > time_based: Avoid restarting main I/O loop > > that patch tries to keep us in the main loop and reset while going, which is > a good thing for short jobs as it keeps the overhead low. But it breaks > verification for short jobs! I've checked in this fix: > > http://git.kernel.dk/cgit/fio/commit/?id=f1a32461c844c7ba9314f66dd28b5a01ca7cb69a > > Please try and see if that fixes things for you. You ran into OOM because > you had async verify enabled, yet we never go to run it. So we just kept > piling on buffers to verify, but we never did... > > > > On 03/24/2016 08:06 AM, Jens Axboe wrote: >> >> I'll take a look at it. The device is only 128MB? Did you mean GB? >> >> What version of fio are you running? >> >> On 03/24/2016 06:50 AM, flash yan wrote: >>> >>> Another thing is that older version fio don't have this issue. >>> >>> 2016-03-24 7:32 GMT+08:00 Jeff Furlong <jeff.furlong@xxxxxxxx>: >>>> >>>> I believe only the CRC is buffered in DRAM. So if your IO's size >>>> (bs=X) is large or small, the buffered CRC is the same size per IO. >>>> But, as you increase the bs, the IOPs decreases. As you decrease the >>>> bs, the IOPs increases. The total amount of buffered CRC's in DRAM >>>> increases with more IOPs (with a fixed runtime). You can calculate >>>> out how many IO's times your CRC size will fit into DRAM, then set >>>> your verify_backlog value to be less than that. >>>> >>>> Regards, >>>> Jeff >>>> >>>> -----Original Message----- >>>> From: flash yan [mailto:flashyan83@xxxxxxxxx] >>>> Sent: Wednesday, March 23, 2016 3:52 PM >>>> To: Jeff Furlong <jeff.furlong@xxxxxxxx> >>>> Cc: Jens Axboe <axboe@xxxxxxxxx>; fio@xxxxxxxxxxxxxxx >>>> Subject: Re: Linux OS killed fio process because fio invoked oom_killer >>>> >>>> I will try verify_backlog option. >>>> I have a question. Why it happened with io_size to 4096 not other >>>> io_size? Other io_size should have same problem. >>>> >>>> 2016-03-24 3:29 GMT+08:00 Jeff Furlong <jeff.furlong@xxxxxxxx>: >>>>> >>>>> I believe you are seeing expected behavior. When verify is enabled, >>>>> the written data is buffered in DRAM until the job is finished, then >>>>> compared by reading data from the device. If the device capacity is >>>>> large, or if the device capacity is small but you set the runtime, >>>>> you will buffer many IO's. So the oom_killer sees the process as >>>>> hogging most of the DRAM, then kills it. When verify is disabled, >>>>> no buffering takes place, so no oom_killer. >>>>> >>>>> Try the verify_backlog option. If you have a 4KB bs, and you set >>>>> verify_backlog=1048576, then you'll write out 4GB of data, then read >>>>> it back and compare with the DRAM buffer, then start again. Just be >>>>> sure the verify_backlog value is less than your free DRAM. >>>>> >>>>> Regards, >>>>> Jeff >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: fio-owner@xxxxxxxxxxxxxxx [mailto:fio-owner@xxxxxxxxxxxxxxx] On >>>>> Behalf Of flash yan >>>>> Sent: Wednesday, March 23, 2016 8:10 AM >>>>> To: Jens Axboe <axboe@xxxxxxxxx> >>>>> Cc: fio@xxxxxxxxxxxxxxx >>>>> Subject: Re: Linux OS killed fio process because fio invoked >>>>> oom_killer >>>>> >>>>> I have run fio without verify and this issue didn't happen. So it >>>>> should be verify issue. >>>>> The fio job file is as below: >>>>> >>>>> [global] >>>>> thread=1 >>>>> invalidate=1 >>>>> rw=randwrite >>>>> time_based=1 >>>>> runtime=3000 >>>>> rwmixread=50 >>>>> ioengine=libaio >>>>> direct=1 >>>>> bs=4096 >>>>> iodepth=16 >>>>> verify_dump=1 >>>>> verify_async=10 >>>>> do_verify=1 >>>>> verify=meta >>>>> verify_pattern="meta" >>>>> [job0] >>>>> filename=/dev/sda >>>>> [job1] >>>>> filename=/dev/sdb >>>>> >>>>> I think you can use ram disk(ubuntu have ram disk /dev/ram*) to >>>>> reproduce this issue. >>>>> It happened with devices which have high speed. >>>>> >>>>> 2016-03-23 8:42 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>: >>>>>> >>>>>> What job did you run? When reporting a potential issue, always >>>>>> include that. Hard to help or advise otherwise. >>>>>> >>>>>>> On Mar 22, 2016, at 5:12 PM, flash yan <flashyan83@xxxxxxxxx> wrote: >>>>>>> >>>>>>> This issue happened after about 20 minutes. The iscsi device is very >>>>>>> small, only 128MB. >>>>>>> As you said, I have enabled verify= options. >>>>>>> I will try big iscsi device and no verify. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Liang Yan >>>>>>> >>>>>>> 2016-03-23 0:30 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>: >>>>>>>>> >>>>>>>>> On 03/22/2016 08:06 AM, flash yan wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have run fio-2.7 to test iscsi device, one unusual issue >>>>>>>>> happened. >>>>>>>>> If I set the io_size to 4096, queue_depth to 16 ,rw to randwrite >>>>>>>>> and run_time to 3000, the fio would invoke oom_killer and the >>>>>>>>> Linux OS would kill the fio process. >>>>>>>>> The machine have about 11 GB memory and I have tried the machine >>>>>>>>> with 23GB, the issue also happened. >>>>>>>>> I think fio have problem when dealing with 4KB io_size then used >>>>>>>>> too many memory. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> When did this happen - shortly after the job is started, or long >>>>>>>> after? How big is the iscsi device? Did you have verify= options >>>>>>>> enabled? >>>>>>>> >>>>>>>> -- >>>>>>>> Jens Axboe >>>>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe fio" in the >>>>> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at >>>>> http://vger.kernel.org/majordomo-info.html >>>>> Western Digital Corporation (and its subsidiaries) E-mail >>>>> Confidentiality Notice & Disclaimer: >>>>> >>>>> This e-mail and any files transmitted with it may contain >>>>> confidential or legally privileged information of WDC and/or its >>>>> affiliates, and are intended solely for the use of the individual or >>>>> entity to which they are addressed. If you are not the intended >>>>> recipient, any disclosure, copying, distribution or any action taken >>>>> or omitted to be taken in reliance on it, is prohibited. If you have >>>>> received this e-mail in error, please notify the sender immediately >>>>> and delete the e-mail in its entirety from your system. >>>> >>>> Western Digital Corporation (and its subsidiaries) E-mail >>>> Confidentiality Notice & Disclaimer: >>>> >>>> This e-mail and any files transmitted with it may contain >>>> confidential or legally privileged information of WDC and/or its >>>> affiliates, and are intended solely for the use of the individual or >>>> entity to which they are addressed. If you are not the intended >>>> recipient, any disclosure, copying, distribution or any action taken >>>> or omitted to be taken in reliance on it, is prohibited. If you have >>>> received this e-mail in error, please notify the sender immediately >>>> and delete the e-mail in its entirety from your system. >> >> >> > > > -- > Jens Axboe > -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html