Re: Linux OS killed fio process because fio invoked oom_killer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another thing is that older version fio don't have this issue.

2016-03-24 7:32 GMT+08:00 Jeff Furlong <jeff.furlong@xxxxxxxx>:
> I believe only the CRC is buffered in DRAM.  So if your IO's size (bs=X) is large or small, the buffered CRC is the same size per IO.  But, as you increase the bs, the IOPs decreases.  As you decrease the bs, the IOPs increases.  The total amount of buffered CRC's in DRAM increases with more IOPs (with a fixed runtime).  You can calculate out how many IO's times your CRC size will fit into DRAM, then set your verify_backlog value to be less than that.
>
> Regards,
> Jeff
>
> -----Original Message-----
> From: flash yan [mailto:flashyan83@xxxxxxxxx]
> Sent: Wednesday, March 23, 2016 3:52 PM
> To: Jeff Furlong <jeff.furlong@xxxxxxxx>
> Cc: Jens Axboe <axboe@xxxxxxxxx>; fio@xxxxxxxxxxxxxxx
> Subject: Re: Linux OS killed fio process because fio invoked oom_killer
>
> I will try verify_backlog option.
> I have a question. Why it happened with io_size to 4096 not other io_size? Other io_size should have same problem.
>
> 2016-03-24 3:29 GMT+08:00 Jeff Furlong <jeff.furlong@xxxxxxxx>:
>> I believe you are seeing expected behavior.  When verify is enabled, the written data is buffered in DRAM until the job is finished, then compared by reading data from the device.  If the device capacity is large, or if the device capacity is small but you set the runtime, you will buffer many IO's.  So the oom_killer sees the process as hogging most of the DRAM, then kills it.  When verify is disabled, no buffering takes place, so no oom_killer.
>>
>> Try the verify_backlog option.  If you have a 4KB bs, and you set verify_backlog=1048576, then you'll write out 4GB of data, then read it back and compare with the DRAM buffer, then start again.  Just be sure the verify_backlog value is less than your free DRAM.
>>
>> Regards,
>> Jeff
>>
>>
>> -----Original Message-----
>> From: fio-owner@xxxxxxxxxxxxxxx [mailto:fio-owner@xxxxxxxxxxxxxxx] On
>> Behalf Of flash yan
>> Sent: Wednesday, March 23, 2016 8:10 AM
>> To: Jens Axboe <axboe@xxxxxxxxx>
>> Cc: fio@xxxxxxxxxxxxxxx
>> Subject: Re: Linux OS killed fio process because fio invoked
>> oom_killer
>>
>> I have run fio without verify and this issue didn't happen. So it should be verify issue.
>> The fio job file is as below:
>>
>> [global]
>> thread=1
>> invalidate=1
>> rw=randwrite
>> time_based=1
>> runtime=3000
>> rwmixread=50
>> ioengine=libaio
>> direct=1
>> bs=4096
>> iodepth=16
>> verify_dump=1
>> verify_async=10
>> do_verify=1
>> verify=meta
>> verify_pattern="meta"
>> [job0]
>> filename=/dev/sda
>> [job1]
>> filename=/dev/sdb
>>
>> I think you can use ram disk(ubuntu have ram disk /dev/ram*) to reproduce this issue.
>> It happened with devices which have high speed.
>>
>> 2016-03-23 8:42 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>:
>>> What job did you run? When reporting a potential issue, always include that. Hard to help or advise otherwise.
>>>
>>>> On Mar 22, 2016, at 5:12 PM, flash yan <flashyan83@xxxxxxxxx> wrote:
>>>>
>>>> This issue happened after about 20 minutes. The iscsi device is very
>>>> small, only 128MB.
>>>> As you said, I have enabled verify= options.
>>>> I will try big iscsi device and no verify.
>>>>
>>>> Thanks
>>>>
>>>> Liang Yan
>>>>
>>>> 2016-03-23 0:30 GMT+08:00 Jens Axboe <axboe@xxxxxxxxx>:
>>>>>> On 03/22/2016 08:06 AM, flash yan wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have run fio-2.7 to test iscsi device, one unusual issue happened.
>>>>>> If I set the io_size to 4096, queue_depth to 16 ,rw to randwrite
>>>>>> and run_time to 3000, the fio would invoke oom_killer and the
>>>>>> Linux OS would kill the fio process.
>>>>>> The machine have about 11 GB memory and I have tried the machine
>>>>>> with 23GB, the issue also happened.
>>>>>> I think fio have problem when dealing with 4KB io_size then used
>>>>>> too many memory.
>>>>>
>>>>>
>>>>> When did this happen - shortly after the job is started, or long
>>>>> after? How big is the iscsi device? Did you have verify= options enabled?
>>>>>
>>>>> --
>>>>> Jens Axboe
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fio" in the
>> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux