++adding @ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx <ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@xxxxxxx> ++Adding dev@xxxxxxx Thanks,&, Regards Arihant Jain On Mon, 27 Nov, 2023, 7:48 am AJ_ sunny, <jains8550@xxxxxxxxx> wrote: > Hi team, > > After doing above changes I am still getting the issue in which machine > continuously went shutdown > > In nova-compute logs I am getting only this footprint > > Logs:- > 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager > [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515 848 > 316d215042914de190f5f9e1c8466bf0 default default] [instance: > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent > network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance with > vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO > nova.compute.manager [-] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] > VM Stopped (Lifecyc Event) > > 2023-10-21 22:42:44.683 7 INFO nova.compute.manager > [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1- > fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB > power_state (1) does not match the vm_power_state from ti e hypervisor (4). > Updating power_state in the DB to match the hypervisor. > > 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager > [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f > 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling the > stop API. Current vm_state: active, current task_state : None, original DB > power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7 INFO > nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] > [instance: 4b04d3f1-1 > > fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the > hypervisor when stop is called. > > > And in this architecture we are using ceph is the backend storage for > Nova,glance & cinder > When machine auto goes down and if i try to start the machine it will go > in error i.e. in Vm console is show I/O ERROR during boot so first we need > to rebuild the volume from ceph side then I have to start the machine > Rbd object-map rebuild<volume-id> > Openstack server start <server-id> > > So this issue is showing two faces one from ceph side and another from > nova-compute log > can someone please help me out to fix out this issue asap > > Thanks & Regards > Arihant Jain > > On Tue, 24 Oct, 2023, 4:56 pm , <smooney@xxxxxxxxxx> wrote: > >> On Tue, 2023-10-24 at 10:11 +0530, AJ_ sunny wrote: >> > Hi team, >> > >> > Vm is not shutting off by owner from inside its automatically went to >> > shutdown i.e. libvirt lifecycle stop event triggering >> > In my nova.conf configuration I am using ram_allocation_ratio = 1.5 >> > And previously I tried to set in nova.conf >> > Sync_power_state_interval = -1 but still facing the same problem >> > OOM might be causing this issue >> > Can you please give me some idea to fix this issue if OOM is the cause >> the general answer is swap. >> >> nova should alwasy be deployed with swap even if you do not have over >> commit enabled. >> there are a few reason for this the first being python allocates memory >> diffently if >> any swap is aviable, even 1G is enough to have it not try to commit all >> memory. so >> when swap is aviable the nova/neutron agents will use much less resident >> memeory even with >> out usign any of the swap space. >> >> we have some docs about this downstream >> >> https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-the-compute-service_osp#ref_calculating-swap-size_configuring-compute >> >> if you are being ultra conservative we recommend allocating (ram * >> allocation ratio) in swap so in your case allcoate >> 1.5 times your ram as swap. we woudl expect the actul useage of swap to >> be a small fraction of that however so we >> also provide a formula for >> >> overcommit_ratio = NovaRAMAllocationRatio - 1 >> Minimum swap size (MB) = (total_RAM * overcommit_ratio) + >> RHEL_min_swap >> Recommended swap size (MB) = total_RAM * (overcommit_ratio + >> percentage_of_RAM_to_use_for_swap) >> >> so say your host had 64G of ram with an allocation ratio of 1.5 and a min >> swap percentaiong of 25% >> the conserviver swap recommentation would be >> >> (64*(0.5+0.25)) + disto_min_swap >> (64*0.75) + 4G = 52G of recommended swap >> >> if your wondering why we add a min swap precentage and disto min swap its >> basically to acocund for the >> Qemu and host OS memory overhead as well as the memory used by the >> nova/neutron agents and libvirt/ovs >> >> >> if your not using memory over commit my general recommdation is if you >> have less then 64G of ram allcoate 16G if you >> have more then 256G of ram allocate 64G and you should be fine. when you >> do use memofy over commit you must >> have at least enouch swap to account for the qemu overhead of all >> instance + the over committed memory. >> >> >> the other common cause of OOM errors is if you are using numa affinity >> and the guest dont request >> hw:mem_page_size=<something> without setting a mem_page_size request we >> dont do numa aware memory placement. the kernel >> OOM system works >> on a per numa node basis, numa affintiy does not supprot memory over >> commit either so that is likly not your issue. >> i jsut said i woudl mention it to cover all basis. >> >> regards >> sean >> >> >> >> > >> > >> > Thanks & Regards >> > Arihant Jain >> > >> > On Mon, 23 Oct, 2023, 11:29 pm , <smooney@xxxxxxxxxx> wrote: >> > >> > > On Mon, 2023-10-23 at 13:19 -0400, Jonathan Proulx wrote: >> > > > >> > > > I've seen similar log traces with overcommitted memory when the >> > > > hypervisor runs out of physical memory and OOM killer gets the VM >> > > > process. >> > > > >> > > > This is an unusuall configuration (I think) but if the VM owner >> claims >> > > > they didn't power down the VM internally you might look at the local >> > > > hypevisor logs to see if the VM process crashed or was killed for >> some >> > > > other reason. >> > > yep OOM events are one common causes fo this. >> > > >> > > nova is bacialy just saying "hay you said this vm should be active >> but its >> > > not, im going to update the db to reflect >> > > reality." you can turn that off with >> > > >> > > >> https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events >> > > or >> > > >> > > >> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval >> > > either disabel the sync via setign the interval to -1 >> > > or disable haneling the virt lifecycle events. >> > > >> > > i would recommend the sync_power_state_interval approach but again if >> vms >> > > are stopping >> > > and you dont know why you likely should discover why rahter then just >> > > turning if the update of the nova db to reflect >> > > the actual sate. >> > > >> > > > >> > > > -Jon >> > > > >> > > > On Mon, Oct 23, 2023 at 02:02:26PM +0100, smooney@xxxxxxxxxx wrote: >> > > > :On Mon, 2023-10-23 at 17:45 +0530, AJ_ sunny wrote: >> > > > :> Hi team, >> > > > :> >> > > > :> I am using openstack kolla ansible on wallaby version and >> currently I >> > > am >> > > > :> facing issue with virtual machine, vm is shutoff by itself and >> and >> > > from log >> > > > :> it seems libvirt lifecycle stop event triggering again and again >> > > > :> >> > > > :> Logs:- >> > > > :> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager >> > > > :> [req-c7b731db-2b61-400e-917f-8645c9984696 >> f226d81a45dd46488fb2e19515 >> > > 848 >> > > > :> 316d215042914de190f5f9e1c8466bf0 default default] [instance: >> > > > :> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent >> > > > :> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for >> instance >> > > with >> > > > :> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 >> INFO >> > > > :> nova.compute.manager [-] [instance: >> > > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] >> > > > :> VM Stopped (Lifecyc Event) >> > > > :> >> > > > :> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager >> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1- >> > > > :> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state >> the DB >> > > > :> power_state (1) does not match the vm_power_state from ti e >> > > hypervisor (4). >> > > > :> Updating power_state in the DB to match the hypervisor. >> > > > :> >> > > > :> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager >> > > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: >> 4b04d3f >> > > > :> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. >> Calling >> > > the >> > > > :> stop API. Current vm_state: active, current task_state : None, >> > > original DB >> > > > :> power_state: 1, current VM power_state: 4 2023-10-21 >> 22:42:44.977 7 >> > > INFO >> > > > :> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] >> > > > :> [instance: 4b04d3f1-1 >> > > > :> >> > > > :> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in >> the >> > > > :> hypervisor when stop is called. >> > > > : >> > > > :that sounds like the guest os shutdown the vm. >> > > > :i.e. somethign in the guest ran sudo poweroff >> > > > :then nova detected teh vm was stoped by the user and updated its >> db to >> > > match >> > > > :that. >> > > > : >> > > > :that is the expected beahvior wehn you have the power sync enabled. >> > > > :it is enabled by default. >> > > > :> >> > > > :> >> > > > :> Thanks & Regards >> > > > :> Arihant Jain >> > > > :> +91 8299719369 >> > > > : >> > > > >> > > >> > > >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx