Hello Satish, On Thu, Feb 9, 2023 at 11:52 AM Satish Patel <satish.txt@xxxxxxxxx> wrote: > > Folks, > > Any idea what is going on, I am running 3 node quincy version of openstack > and today suddenly i noticed the following error. I found reference link > but not sure if that is my issue or not > https://tracker.ceph.com/issues/51974 > > root@ceph1:~# ceph -s > cluster: > id: cd748128-a3ea-11ed-9e46-c309158fad32 > health: HEALTH_ERR > > 1 mgr modules have recently crashed > > services: > mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 2d) > mgr: ceph1.ckfkeb(active, since 6h), standbys: ceph2.aaptny > osd: 9 osds: 9 up (since 2d), 9 in (since 2d) > > data: > pools: 4 pools, 128 pgs > objects: 1.18k objects, 4.7 GiB > usage: 17 GiB used, 16 TiB / 16 TiB avail > pgs: 128 active+clean > > > > root@ceph1:~# ceph health > HEALTH_ERR Module 'devicehealth' has failed: disk I/O error; 1 mgr modules > have recently crashed > root@ceph1:~# ceph crash ls > ID ENTITY > NEW > 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035 > mgr.ceph1.ckfkeb * > root@ceph1:~# ceph crash info > 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035 > { > "backtrace": [ > " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373, > in serve\n self.scrape_all()", > " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425, > in scrape_all\n self.put_device_metrics(device, data)", > " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500, > in put_device_metrics\n self._create_device(devid)", > " File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487, > in _create_device\n cursor = self.db.execute(SQL, (devid,))", > "sqlite3.OperationalError: disk I/O error" > ], > "ceph_version": "17.2.5", > "crash_id": > "2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035", > "entity_name": "mgr.ceph1.ckfkeb", > "mgr_module": "devicehealth", > "mgr_module_caller": "PyModuleRunner::serve", > "mgr_python_exception": "OperationalError", > "os_id": "centos", > "os_name": "CentOS Stream", > "os_version": "8", > "os_version_id": "8", > "process_name": "ceph-mgr", > "stack_sig": > "7e506cc2729d5a18403f0373447bb825b42aafa2405fb0e5cfffc2896b093ed8", > "timestamp": "2023-02-07T00:07:12.739187Z", > "utsname_hostname": "ceph1", > "utsname_machine": "x86_64", > "utsname_release": "5.15.0-58-generic", > "utsname_sysname": "Linux", > "utsname_version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023" It is probably: https://tracker.ceph.com/issues/55606 It is annoying but not serious. The mgr simply lost its lock to the sqlite database for the devicehealth module. You can workaround by restarting the mgr: ceph mgr fail -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx