Re: openstack Vm shutoff by itself

AJ_ sunny <jains8550@xxxxxxxxx> · Mon, 27 Nov 2023 12:29:07 +0530

Hi team,

Any update on this?

Thanks & Regards
Arihant Jain

On Mon, 27 Nov, 2023, 8:07 am AJ_ sunny, <jains8550@xxxxxxxxx> wrote:

> ++adding
> @ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx
> <ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx>
> ++Adding dev@xxxxxxx
>
>
> Thanks,&, Regards
> Arihant Jain
>
> On Mon, 27 Nov, 2023, 7:48 am AJ_ sunny, <jains8550@xxxxxxxxx> wrote:
>
>> Hi team,
>>
>> After doing above changes I am still getting the issue in which machine
>> continuously went shutdown
>>
>> In nova-compute logs I am getting only this footprint
>>
>> Logs:-
>> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
>> [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515 848
>> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
>> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
>> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance with
>> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO
>> nova.compute.manager [-] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
>> VM Stopped (Lifecyc Event)
>>
>> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
>> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-
>> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB
>> power_state (1) does not match the vm_power_state from ti e hypervisor (4).
>> Updating power_state in the DB to match the hypervisor.
>>
>> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
>> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f
>> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling the
>> stop API. Current vm_state: active, current task_state : None, original DB
>> power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7 INFO
>> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -]
>> [instance: 4b04d3f1-1
>>
>> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the
>> hypervisor when stop is called.
>>
>>
>> And in this architecture we are using ceph is the backend storage for
>> Nova,glance & cinder
>> When machine auto goes down and if i try to start the machine it will go
>> in error i.e. in Vm console is show I/O ERROR during boot so first we need
>> to rebuild the volume from ceph side then I have to start the machine
>> Rbd object-map rebuild<volume-id>
>> Openstack server start <server-id>
>>
>> So this issue is showing two faces one from ceph side and another from
>> nova-compute log
>> can someone please help me out to fix out this issue asap
>>
>> Thanks & Regards
>> Arihant Jain
>>
>> On Tue, 24 Oct, 2023, 4:56 pm , <smooney@xxxxxxxxxx> wrote:
>>
>>> On Tue, 2023-10-24 at 10:11 +0530, AJ_ sunny wrote:
>>> > Hi team,
>>> >
>>> > Vm is not shutting off by owner from inside its automatically went to
>>> > shutdown i.e. libvirt lifecycle stop event triggering
>>> > In my  nova.conf configuration I am using ram_allocation_ratio = 1.5
>>> > And previously I tried to set in nova.conf
>>> > Sync_power_state_interval = -1 but still facing the same problem
>>> > OOM might be causing this issue
>>> > Can you please give me some idea to fix this issue if OOM is the cause
>>> the general answer is swap.
>>>
>>> nova should alwasy be deployed with swap even if you do not have over
>>> commit enabled.
>>> there are a few reason for this the first being python allocates memory
>>> diffently if
>>> any swap is aviable, even 1G is enough to have it not try to commit all
>>> memory. so
>>> when swap is aviable the nova/neutron agents will use much less resident
>>> memeory even with
>>> out usign any of the swap space.
>>>
>>> we have some docs about this downstream
>>>
>>> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-the-compute-service_osp#ref_calculating-swap-size_configuring-compute
>>>
>>> if you are being ultra conservative we recommend allocating (ram *
>>> allocation ratio) in swap so in your case allcoate
>>> 1.5 times your ram as swap. we woudl expect the actul useage of swap to
>>> be a small fraction of that however so we
>>> also provide a formula for
>>>
>>>     overcommit_ratio = NovaRAMAllocationRatio - 1
>>>     Minimum swap size (MB) = (total_RAM * overcommit_ratio) +
>>> RHEL_min_swap
>>>     Recommended swap size (MB) = total_RAM * (overcommit_ratio +
>>> percentage_of_RAM_to_use_for_swap)
>>>
>>> so say your host had 64G of ram with an allocation ratio of 1.5 and a
>>> min swap percentaiong of 25%
>>> the conserviver swap recommentation would be
>>>
>>> (64*(0.5+0.25)) + disto_min_swap
>>> (64*0.75) + 4G = 52G of recommended swap
>>>
>>> if your wondering why we add a min swap precentage and disto min swap
>>> its basically to acocund for the
>>> Qemu and host OS memory overhead as well as the memory used by the
>>> nova/neutron agents and libvirt/ovs
>>>
>>>
>>> if your not using memory over commit my general recommdation is if you
>>> have less then 64G of ram allcoate 16G if you
>>> have more then 256G of ram allocate 64G and you should be fine. when you
>>> do use memofy over commit you must
>>> have at least enouch swap to account for the qemu overhead of all
>>> instance + the over committed memory.
>>>
>>>
>>> the other common cause of OOM errors is if you are using numa affinity
>>> and the guest dont request
>>> hw:mem_page_size=<something> without setting a mem_page_size request we
>>> dont do numa aware memory placement. the kernel
>>> OOM system works
>>> on a per numa node basis, numa affintiy does not supprot memory over
>>> commit either so that is likly not your issue.
>>> i jsut said i woudl mention it to cover all basis.
>>>
>>> regards
>>> sean
>>>
>>>
>>>
>>> >
>>> >
>>> > Thanks & Regards
>>> > Arihant Jain
>>> >
>>> > On Mon, 23 Oct, 2023, 11:29 pm , <smooney@xxxxxxxxxx> wrote:
>>> >
>>> > > On Mon, 2023-10-23 at 13:19 -0400, Jonathan Proulx wrote:
>>> > > >
>>> > > > I've seen similar log traces with overcommitted memory when the
>>> > > > hypervisor runs out of physical memory and OOM killer gets the VM
>>> > > > process.
>>> > > >
>>> > > > This is an unusuall configuration (I think) but if the VM owner
>>> claims
>>> > > > they didn't power down the VM internally you might look at the
>>> local
>>> > > > hypevisor logs to see if the VM process crashed or was killed for
>>> some
>>> > > > other reason.
>>> > > yep OOM events are one common causes fo this.
>>> > >
>>> > > nova is bacialy just saying "hay you said this vm should be active
>>> but its
>>> > > not, im going to update the db to reflect
>>> > > reality." you can turn that off with
>>> > >
>>> > >
>>> https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events
>>> > > or
>>> > >
>>> > >
>>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval
>>> > > either disabel the sync via setign the interval to -1
>>> > > or disable haneling the virt lifecycle events.
>>> > >
>>> > > i would recommend the sync_power_state_interval approach but again
>>> if vms
>>> > > are stopping
>>> > > and you dont know why you likely should discover why rahter then just
>>> > > turning if the update of the nova db to reflect
>>> > > the actual sate.
>>> > >
>>> > > >
>>> > > > -Jon
>>> > > >
>>> > > > On Mon, Oct 23, 2023 at 02:02:26PM +0100, smooney@xxxxxxxxxx
>>>  wrote:
>>> > > > :On Mon, 2023-10-23 at 17:45 +0530, AJ_ sunny wrote:
>>> > > > :> Hi team,
>>> > > > :>
>>> > > > :> I am using openstack kolla ansible on wallaby version and
>>> currently I
>>> > > am
>>> > > > :> facing issue with virtual machine, vm is shutoff by itself and
>>> and
>>> > > from log
>>> > > > :> it seems libvirt lifecycle stop event triggering again and again
>>> > > > :>
>>> > > > :> Logs:-
>>> > > > :> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
>>> > > > :> [req-c7b731db-2b61-400e-917f-8645c9984696
>>> f226d81a45dd46488fb2e19515
>>> > > 848
>>> > > > :> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
>>> > > > :> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
>>> > > > :> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for
>>> instance
>>> > > with
>>> > > > :> vm_state active and task_state None. 2023-10-21 22:42:44.589 7
>>> INFO
>>> > > > :> nova.compute.manager [-] [instance:
>>> > > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
>>> > > > :> VM Stopped (Lifecyc Event)
>>> > > > :>
>>> > > > :> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
>>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance:
>>> 4b04d3f1-
>>> > > > :> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state
>>> the DB
>>> > > > :> power_state (1) does not match the vm_power_state from ti e
>>> > > hypervisor (4).
>>> > > > :> Updating power_state in the DB to match the hypervisor.
>>> > > > :>
>>> > > > :> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
>>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance:
>>> 4b04d3f
>>> > > > :> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself.
>>> Calling
>>> > > the
>>> > > > :> stop API. Current vm_state: active, current task_state : None,
>>> > > original DB
>>> > > > :> power_state: 1, current VM power_state: 4 2023-10-21
>>> 22:42:44.977 7
>>> > > INFO
>>> > > > :> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d
>>> -]
>>> > > > :> [instance: 4b04d3f1-1
>>> > > > :>
>>> > > > :> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in
>>> the
>>> > > > :> hypervisor when stop is called.
>>> > > > :
>>> > > > :that sounds like the guest os shutdown the vm.
>>> > > > :i.e. somethign in the guest ran sudo poweroff
>>> > > > :then nova detected teh vm was stoped by the user and updated its
>>> db to
>>> > > match
>>> > > > :that.
>>> > > > :
>>> > > > :that is the expected beahvior wehn you have the power sync
>>> enabled.
>>> > > > :it is enabled by default.
>>> > > > :>
>>> > > > :>
>>> > > > :> Thanks & Regards
>>> > > > :> Arihant Jain
>>> > > > :> +91 8299719369
>>> > > > :
>>> > > >
>>> > >
>>> > >
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx