Hi team, Any update on this? Thanks & Regards Arihant Jain On Mon, 27 Nov, 2023, 8:07 am AJ_ sunny, <jains8550@xxxxxxxxx> wrote: > ++adding > @ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx > <ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx> > ++Adding dev@xxxxxxx > > > Thanks,&, Regards > Arihant Jain > > On Mon, 27 Nov, 2023, 7:48 am AJ_ sunny, <jains8550@xxxxxxxxx> wrote: > >> Hi team, >> >> After doing above changes I am still getting the issue in which machine >> continuously went shutdown >> >> In nova-compute logs I am getting only this footprint >> >> Logs:- >> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager >> [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515 848 >> 316d215042914de190f5f9e1c8466bf0 default default] [instance: >> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent >> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance with >> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO >> nova.compute.manager [-] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] >> VM Stopped (Lifecyc Event) >> >> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager >> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1- >> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB >> power_state (1) does not match the vm_power_state from ti e hypervisor (4). >> Updating power_state in the DB to match the hypervisor. >> >> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager >> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f >> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling the >> stop API. Current vm_state: active, current task_state : None, original DB >> power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7 INFO >> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] >> [instance: 4b04d3f1-1 >> >> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the >> hypervisor when stop is called. >> >> >> And in this architecture we are using ceph is the backend storage for >> Nova,glance & cinder >> When machine auto goes down and if i try to start the machine it will go >> in error i.e. in Vm console is show I/O ERROR during boot so first we need >> to rebuild the volume from ceph side then I have to start the machine >> Rbd object-map rebuild<volume-id> >> Openstack server start <server-id> >> >> So this issue is showing two faces one from ceph side and another from >> nova-compute log >> can someone please help me out to fix out this issue asap >> >> Thanks & Regards >> Arihant Jain >> >> On Tue, 24 Oct, 2023, 4:56 pm , <smooney@xxxxxxxxxx> wrote: >> >>> On Tue, 2023-10-24 at 10:11 +0530, AJ_ sunny wrote: >>> > Hi team, >>> > >>> > Vm is not shutting off by owner from inside its automatically went to >>> > shutdown i.e. libvirt lifecycle stop event triggering >>> > In my nova.conf configuration I am using ram_allocation_ratio = 1.5 >>> > And previously I tried to set in nova.conf >>> > Sync_power_state_interval = -1 but still facing the same problem >>> > OOM might be causing this issue >>> > Can you please give me some idea to fix this issue if OOM is the cause >>> the general answer is swap. >>> >>> nova should alwasy be deployed with swap even if you do not have over >>> commit enabled. >>> there are a few reason for this the first being python allocates memory >>> diffently if >>> any swap is aviable, even 1G is enough to have it not try to commit all >>> memory. so >>> when swap is aviable the nova/neutron agents will use much less resident >>> memeory even with >>> out usign any of the swap space. >>> >>> we have some docs about this downstream >>> >>> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-the-compute-service_osp#ref_calculating-swap-size_configuring-compute >>> >>> if you are being ultra conservative we recommend allocating (ram * >>> allocation ratio) in swap so in your case allcoate >>> 1.5 times your ram as swap. we woudl expect the actul useage of swap to >>> be a small fraction of that however so we >>> also provide a formula for >>> >>> overcommit_ratio = NovaRAMAllocationRatio - 1 >>> Minimum swap size (MB) = (total_RAM * overcommit_ratio) + >>> RHEL_min_swap >>> Recommended swap size (MB) = total_RAM * (overcommit_ratio + >>> percentage_of_RAM_to_use_for_swap) >>> >>> so say your host had 64G of ram with an allocation ratio of 1.5 and a >>> min swap percentaiong of 25% >>> the conserviver swap recommentation would be >>> >>> (64*(0.5+0.25)) + disto_min_swap >>> (64*0.75) + 4G = 52G of recommended swap >>> >>> if your wondering why we add a min swap precentage and disto min swap >>> its basically to acocund for the >>> Qemu and host OS memory overhead as well as the memory used by the >>> nova/neutron agents and libvirt/ovs >>> >>> >>> if your not using memory over commit my general recommdation is if you >>> have less then 64G of ram allcoate 16G if you >>> have more then 256G of ram allocate 64G and you should be fine. when you >>> do use memofy over commit you must >>> have at least enouch swap to account for the qemu overhead of all >>> instance + the over committed memory. >>> >>> >>> the other common cause of OOM errors is if you are using numa affinity >>> and the guest dont request >>> hw:mem_page_size=<something> without setting a mem_page_size request we >>> dont do numa aware memory placement. the kernel >>> OOM system works >>> on a per numa node basis, numa affintiy does not supprot memory over >>> commit either so that is likly not your issue. >>> i jsut said i woudl mention it to cover all basis. >>> >>> regards >>> sean >>> >>> >>> >>> > >>> > >>> > Thanks & Regards >>> > Arihant Jain >>> > >>> > On Mon, 23 Oct, 2023, 11:29 pm , <smooney@xxxxxxxxxx> wrote: >>> > >>> > > On Mon, 2023-10-23 at 13:19 -0400, Jonathan Proulx wrote: >>> > > > >>> > > > I've seen similar log traces with overcommitted memory when the >>> > > > hypervisor runs out of physical memory and OOM killer gets the VM >>> > > > process. >>> > > > >>> > > > This is an unusuall configuration (I think) but if the VM owner >>> claims >>> > > > they didn't power down the VM internally you might look at the >>> local >>> > > > hypevisor logs to see if the VM process crashed or was killed for >>> some >>> > > > other reason. >>> > > yep OOM events are one common causes fo this. >>> > > >>> > > nova is bacialy just saying "hay you said this vm should be active >>> but its >>> > > not, im going to update the db to reflect >>> > > reality." you can turn that off with >>> > > >>> > > >>> https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events >>> > > or >>> > > >>> > > >>> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval >>> > > either disabel the sync via setign the interval to -1 >>> > > or disable haneling the virt lifecycle events. >>> > > >>> > > i would recommend the sync_power_state_interval approach but again >>> if vms >>> > > are stopping >>> > > and you dont know why you likely should discover why rahter then just >>> > > turning if the update of the nova db to reflect >>> > > the actual sate. >>> > > >>> > > > >>> > > > -Jon >>> > > > >>> > > > On Mon, Oct 23, 2023 at 02:02:26PM +0100, smooney@xxxxxxxxxx >>> wrote: >>> > > > :On Mon, 2023-10-23 at 17:45 +0530, AJ_ sunny wrote: >>> > > > :> Hi team, >>> > > > :> >>> > > > :> I am using openstack kolla ansible on wallaby version and >>> currently I >>> > > am >>> > > > :> facing issue with virtual machine, vm is shutoff by itself and >>> and >>> > > from log >>> > > > :> it seems libvirt lifecycle stop event triggering again and again >>> > > > :> >>> > > > :> Logs:- >>> > > > :> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager >>> > > > :> [req-c7b731db-2b61-400e-917f-8645c9984696 >>> f226d81a45dd46488fb2e19515 >>> > > 848 >>> > > > :> 316d215042914de190f5f9e1c8466bf0 default default] [instance: >>> > > > :> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent >>> > > > :> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for >>> instance >>> > > with >>> > > > :> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 >>> INFO >>> > > > :> nova.compute.manager [-] [instance: >>> > > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] >>> > > > :> VM Stopped (Lifecyc Event) >>> > > > :> >>> > > > :> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager >>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: >>> 4b04d3f1- >>> > > > :> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state >>> the DB >>> > > > :> power_state (1) does not match the vm_power_state from ti e >>> > > hypervisor (4). >>> > > > :> Updating power_state in the DB to match the hypervisor. >>> > > > :> >>> > > > :> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager >>> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: >>> 4b04d3f >>> > > > :> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. >>> Calling >>> > > the >>> > > > :> stop API. Current vm_state: active, current task_state : None, >>> > > original DB >>> > > > :> power_state: 1, current VM power_state: 4 2023-10-21 >>> 22:42:44.977 7 >>> > > INFO >>> > > > :> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d >>> -] >>> > > > :> [instance: 4b04d3f1-1 >>> > > > :> >>> > > > :> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in >>> the >>> > > > :> hypervisor when stop is called. >>> > > > : >>> > > > :that sounds like the guest os shutdown the vm. >>> > > > :i.e. somethign in the guest ran sudo poweroff >>> > > > :then nova detected teh vm was stoped by the user and updated its >>> db to >>> > > match >>> > > > :that. >>> > > > : >>> > > > :that is the expected beahvior wehn you have the power sync >>> enabled. >>> > > > :it is enabled by default. >>> > > > :> >>> > > > :> >>> > > > :> Thanks & Regards >>> > > > :> Arihant Jain >>> > > > :> +91 8299719369 >>> > > > : >>> > > > >>> > > >>> > > >>> >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx