Re: [Xen-devel] [PATCH] xen-blkfront: use old rinfo after enomem during migration

Dongli Zhang <dongli.zhang@xxxxxxxxxx> · Tue, 4 Dec 2018 11:17:07 +0800

Hi Manjunath,

On 12/04/2018 10:49 AM, Manjunath Patil wrote:
> On 12/3/2018 6:16 PM, Boris Ostrovsky wrote:
> 
>> On 12/3/18 8:14 PM, Dongli Zhang wrote:
>>> Hi Boris,
>>>
>>> On 12/04/2018 12:07 AM, Boris Ostrovsky wrote:
>>>> On 12/2/18 3:31 PM, Manjunath Patil wrote:
>>>>> On 11/30/2018 2:33 PM, Boris Ostrovsky wrote:
>>>>>
>>>>>> On 11/30/18 4:49 PM, Manjunath Patil wrote:
>>>>>>> Thank you Boris for your comments. I removed faulty email of mine.
>>>>>>>
>>>>>>> replies inline.
>>>>>>> On 11/30/2018 12:42 PM, Boris Ostrovsky wrote:
>>>>>>>> On 11/29/18 12:17 AM, Manjunath Patil wrote:
>>>>>>>>> Hi,
>>>>>>>>> Feel free to suggest/comment on this.
>>>>>>>>>
>>>>>>>>> I am trying to do the following at dst during the migration now.
>>>>>>>>> 1. Dont clear the old rinfo in blkif_free(). Instead just clean it.
>>>>>>>>> 2. Store the old rinfo and nr_rings into temp variables in
>>>>>>>>> negotiate_mq()
>>>>>>>>> 3. let nr_rings get re-calculated based on backend data
>>>>>>>>> 4. try allocating new memory based on new nr_rings
>>>>>>>> Since I suspect number of rings will likely be the same why not reuse
>>>>>>>> the rings in the common case?
>>>>>>> I thought attaching devices will be more often than migration. Hence
>>>>>>> did not want add to an extra check for
>>>>>>>     - if I am inside migration code path and
>>>>>>>     - if new nr_rings is equal to old nr_rings or not
>>>>>>>
>>>>>>> Sure addition of such a thing would avoid the memory allocation
>>>>>>> altogether in migration path,
>>>>>>> but it would add a little overhead for normal device addition.
>>>>>>>
>>>>>>> Do you think its worth adding that change?
>>>>>> IMO a couple of extra checks are not going to make much difference.
>>>>> I will add this change
>>>>>> I wonder though --- have you actually seen the case where you did fail
>>>>>> allocation and changes provided in this patch made things work? I am
>>>>>> asking because right after negotiate_mq() we will call setup_blkring()
>>>>>> and it will want to allocate bunch of memory. A failure there is fatal
>>>>>> (to ring setup). So it seems to me that you will survive negotiate_mq()
>>>>>> but then will likely fail soon after.
>>>>> I have noticed the ENOMEM insise negotiate_mq() on ct machine. When I
>>>>> included my patch, I manually triggered the ENOMEM using a debug flag.
>>>>> The patch works for ENOMEM inside negotiate_mq().
>>>>>
>>>>> As you mentioned, if we really hit the ENOMEM in negotiate_mq(), we
>>>>> might hit it in setup_blkring() as well.
>>>>> We should add the similar change to blkif_sring struct as well.
>>>> Won't you have a similar issue with other frontends, say, netfront?
>>> I think the kmalloc is failed not because of OOM.
>>>
>>> In fact, the size of "blkfront_ring_info" is large. When domU have 4
>>> queues/rings, the size of 4 blkfront_ring_info can be about 300+ KB.
>>>
>>> There is chance that kmalloc() 300+ KB would fail.
>>>
>>>
>>> About netfront, to kmalloc() 8 'struct netfront_queue' seems consumes <70 KB?
>> TBH these look like comparable sizes to me.  I am not convinced that
>> these changes will make a difference. If the number of rings on source
>> and destination were the same I'd absolutely agree with this patch but
>> since you are trying to handle different sizes the code becomes somewhat
>> more complex, and I am not sure it's worth it. (Can you actually give me
>> an example of when we can expect number of rings to change during
>> migration?)
>>
>> But others may think differently.
> Hi Boris,
> I think allocation of 300KB chunk[order 7 allocation] is more likely to fail
> than 70KB[order 5] especially under memory pressure.
> If it comes to that, I think we should fix this too.
> 
> The no.of rings in most cases remain 4 thanks to xen_blkif_max_queues module
> parameter.
> If the src host has allocated less than 4[may be vpcu given to this dom0 were
> less than 4], then we can expect the dst to allocate more than src side and vice
> versa.

xen_blkif_max_queues is tunable so the size to kmalloc() would be larger when
both xen_blkif_max_queues and dom0 vcpu are large.

Dongli Zhang