Re: openstack Vm shutoff by itself

AJ_ sunny <jains8550@xxxxxxxxx> · Mon, 27 Nov 2023 08:07:06 +0530

++adding
@ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx
<ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx>
++Adding dev@xxxxxxx

Thanks,&, Regards
Arihant Jain

On Mon, 27 Nov, 2023, 7:48 am AJ_ sunny, <jains8550@xxxxxxxxx> wrote:

> Hi team,
>
> After doing above changes I am still getting the issue in which machine
> continuously went shutdown
>
> In nova-compute logs I am getting only this footprint
>
> Logs:-
> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
> [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515 848
> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance with
> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO
> nova.compute.manager [-] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
> VM Stopped (Lifecyc Event)
>
> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-
> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB
> power_state (1) does not match the vm_power_state from ti e hypervisor (4).
> Updating power_state in the DB to match the hypervisor.
>
> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f
> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling the
> stop API. Current vm_state: active, current task_state : None, original DB
> power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7 INFO
> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -]
> [instance: 4b04d3f1-1
>
> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the
> hypervisor when stop is called.
>
>
> And in this architecture we are using ceph is the backend storage for
> Nova,glance & cinder
> When machine auto goes down and if i try to start the machine it will go
> in error i.e. in Vm console is show I/O ERROR during boot so first we need
> to rebuild the volume from ceph side then I have to start the machine
> Rbd object-map rebuild<volume-id>
> Openstack server start <server-id>
>
> So this issue is showing two faces one from ceph side and another from
> nova-compute log
> can someone please help me out to fix out this issue asap
>
> Thanks & Regards
> Arihant Jain
>
> On Tue, 24 Oct, 2023, 4:56 pm , <smooney@xxxxxxxxxx> wrote:
>
>> On Tue, 2023-10-24 at 10:11 +0530, AJ_ sunny wrote:
>> > Hi team,
>> >
>> > Vm is not shutting off by owner from inside its automatically went to
>> > shutdown i.e. libvirt lifecycle stop event triggering
>> > In my  nova.conf configuration I am using ram_allocation_ratio = 1.5
>> > And previously I tried to set in nova.conf
>> > Sync_power_state_interval = -1 but still facing the same problem
>> > OOM might be causing this issue
>> > Can you please give me some idea to fix this issue if OOM is the cause
>> the general answer is swap.
>>
>> nova should alwasy be deployed with swap even if you do not have over
>> commit enabled.
>> there are a few reason for this the first being python allocates memory
>> diffently if
>> any swap is aviable, even 1G is enough to have it not try to commit all
>> memory. so
>> when swap is aviable the nova/neutron agents will use much less resident
>> memeory even with
>> out usign any of the swap space.
>>
>> we have some docs about this downstream
>>
>> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-the-compute-service_osp#ref_calculating-swap-size_configuring-compute
>>
>> if you are being ultra conservative we recommend allocating (ram *
>> allocation ratio) in swap so in your case allcoate
>> 1.5 times your ram as swap. we woudl expect the actul useage of swap to
>> be a small fraction of that however so we
>> also provide a formula for
>>
>>     overcommit_ratio = NovaRAMAllocationRatio - 1
>>     Minimum swap size (MB) = (total_RAM * overcommit_ratio) +
>> RHEL_min_swap
>>     Recommended swap size (MB) = total_RAM * (overcommit_ratio +
>> percentage_of_RAM_to_use_for_swap)
>>
>> so say your host had 64G of ram with an allocation ratio of 1.5 and a min
>> swap percentaiong of 25%
>> the conserviver swap recommentation would be
>>
>> (64*(0.5+0.25)) + disto_min_swap
>> (64*0.75) + 4G = 52G of recommended swap
>>
>> if your wondering why we add a min swap precentage and disto min swap its
>> basically to acocund for the
>> Qemu and host OS memory overhead as well as the memory used by the
>> nova/neutron agents and libvirt/ovs
>>
>>
>> if your not using memory over commit my general recommdation is if you
>> have less then 64G of ram allcoate 16G if you
>> have more then 256G of ram allocate 64G and you should be fine. when you
>> do use memofy over commit you must
>> have at least enouch swap to account for the qemu overhead of all
>> instance + the over committed memory.
>>
>>
>> the other common cause of OOM errors is if you are using numa affinity
>> and the guest dont request
>> hw:mem_page_size=<something> without setting a mem_page_size request we
>> dont do numa aware memory placement. the kernel
>> OOM system works
>> on a per numa node basis, numa affintiy does not supprot memory over
>> commit either so that is likly not your issue.
>> i jsut said i woudl mention it to cover all basis.
>>
>> regards
>> sean
>>
>>
>>
>> >
>> >
>> > Thanks & Regards
>> > Arihant Jain
>> >
>> > On Mon, 23 Oct, 2023, 11:29 pm , <smooney@xxxxxxxxxx> wrote:
>> >
>> > > On Mon, 2023-10-23 at 13:19 -0400, Jonathan Proulx wrote:
>> > > >
>> > > > I've seen similar log traces with overcommitted memory when the
>> > > > hypervisor runs out of physical memory and OOM killer gets the VM
>> > > > process.
>> > > >
>> > > > This is an unusuall configuration (I think) but if the VM owner
>> claims
>> > > > they didn't power down the VM internally you might look at the local
>> > > > hypevisor logs to see if the VM process crashed or was killed for
>> some
>> > > > other reason.
>> > > yep OOM events are one common causes fo this.
>> > >
>> > > nova is bacialy just saying "hay you said this vm should be active
>> but its
>> > > not, im going to update the db to reflect
>> > > reality." you can turn that off with
>> > >
>> > >
>> https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events
>> > > or
>> > >
>> > >
>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval
>> > > either disabel the sync via setign the interval to -1
>> > > or disable haneling the virt lifecycle events.
>> > >
>> > > i would recommend the sync_power_state_interval approach but again if
>> vms
>> > > are stopping
>> > > and you dont know why you likely should discover why rahter then just
>> > > turning if the update of the nova db to reflect
>> > > the actual sate.
>> > >
>> > > >
>> > > > -Jon
>> > > >
>> > > > On Mon, Oct 23, 2023 at 02:02:26PM +0100, smooney@xxxxxxxxxx wrote:
>> > > > :On Mon, 2023-10-23 at 17:45 +0530, AJ_ sunny wrote:
>> > > > :> Hi team,
>> > > > :>
>> > > > :> I am using openstack kolla ansible on wallaby version and
>> currently I
>> > > am
>> > > > :> facing issue with virtual machine, vm is shutoff by itself and
>> and
>> > > from log
>> > > > :> it seems libvirt lifecycle stop event triggering again and again
>> > > > :>
>> > > > :> Logs:-
>> > > > :> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
>> > > > :> [req-c7b731db-2b61-400e-917f-8645c9984696
>> f226d81a45dd46488fb2e19515
>> > > 848
>> > > > :> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
>> > > > :> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
>> > > > :> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for
>> instance
>> > > with
>> > > > :> vm_state active and task_state None. 2023-10-21 22:42:44.589 7
>> INFO
>> > > > :> nova.compute.manager [-] [instance:
>> > > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
>> > > > :> VM Stopped (Lifecyc Event)
>> > > > :>
>> > > > :> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-
>> > > > :> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state
>> the DB
>> > > > :> power_state (1) does not match the vm_power_state from ti e
>> > > hypervisor (4).
>> > > > :> Updating power_state in the DB to match the hypervisor.
>> > > > :>
>> > > > :> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance:
>> 4b04d3f
>> > > > :> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself.
>> Calling
>> > > the
>> > > > :> stop API. Current vm_state: active, current task_state : None,
>> > > original DB
>> > > > :> power_state: 1, current VM power_state: 4 2023-10-21
>> 22:42:44.977 7
>> > > INFO
>> > > > :> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -]
>> > > > :> [instance: 4b04d3f1-1
>> > > > :>
>> > > > :> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in
>> the
>> > > > :> hypervisor when stop is called.
>> > > > :
>> > > > :that sounds like the guest os shutdown the vm.
>> > > > :i.e. somethign in the guest ran sudo poweroff
>> > > > :then nova detected teh vm was stoped by the user and updated its
>> db to
>> > > match
>> > > > :that.
>> > > > :
>> > > > :that is the expected beahvior wehn you have the power sync enabled.
>> > > > :it is enabled by default.
>> > > > :>
>> > > > :>
>> > > > :> Thanks & Regards
>> > > > :> Arihant Jain
>> > > > :> +91 8299719369
>> > > > :
>> > > >
>> > >
>> > >
>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx