Hi,all which mail servers i should to send? ceph-users@xxxxxxxxxxxxxx or ceph-users@xxxxxxx.I sent the same mail to ceph-users@xxxxxxx yesterday. But i can't find it in mail list sent to me today. I send it again. we use openstack + ceph(hammer) in my production.There are 22 osds on a host and 11 osds share one ssd for osd journal. Unfortunately, one of the ssds does not work,so the 11 osds were down.The osd log shows as below: -1> 2019-09-19 11:35:52.681142 7fcab5354700 1 -- xxxxxxxxxxxx:6831/16460 --> xxxxxxxxxxxx:0/14304 -- osd_ping(ping_reply e6152 stamp 2019-09-19 11:35:52.679939) v2 -- ?+0 0x20af8400 con 0x20a4b340 0> 2019-09-19 11:35:52.682578 7fcabed3c700 -1 os/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7fcabed3c700 time 2019-09-19 11:35:52.640294 os/FileJournal.cc: 1426: FAILED assert(0 == "unexpected aio error") ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc8b55] 2: (FileJournal::write_finish_thread_entry()+0x695) [0xa795c5] 3: (FileJournal::WriteFinisher::entry()+0xd) [0x91cecd] 4: (()+0x7dc5) [0x7fcacb81cdc5] 5: (clone()+0x6d) [0x7fcaca2fd1cd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this As it shows,osd was down at 09-19,But now the pg still is degraded and remapped,It seam stuck. ceph -s cluster 26cc714c-ed78-4d62-9435-db3e87509c5f health HEALTH_WARN 681 pgs degraded 681 pgs stuck degraded 797 pgs stuck unclean 681 pgs stuck undersized 681 pgs undersized recovery 132155/11239182 objects degraded (1.176%) recovery 22056/11239182 objects misplaced (0.196%) monmap e1: 3 mons at {ctrl01=xxx.xxx.xxx.xxx:6789/0,ctrl02=xxx.xxx.xxx.xxx:6789/0,ctrl03=xxx.xxx.xxx.xxx:6789/0} election epoch 122, quorum 0,1,2 ctrl01,ctrl02,ctrl03 osdmap e6590: 324 osds: 313 up, 313 in; 116 remapped pgs pgmap v40849600: 21504 pgs, 6 pools, 14048 GB data, 3658 kobjects 41661 GB used, 279 TB / 319 TB avail 132155/11239182 objects degraded (1.176%) 22056/11239182 objects misplaced (0.196%) 20707 active+clean 681 active+undersized+degraded 116 active+remapped client io 121 MB/s rd, 144 MB/s wr, 1029 op/s I query one of the pgs,the recovery_state is started [1] I also found pg have not third osd to mapped ,as shows below. [root@ctrl01 ~]# ceph pg map 4.75f osdmap e6590 pg 4.75f (4.75f) -> up [34,106] acting [34,106] crushmap at [2] How i should to get the cluster come back ok? can someone help me. very very thanks. [1] https://github.com/rongzhen-zhan/myfile/blob/master/pgquery [2] https://github.com/rongzhen-zhan/myfile/blob/master/crushmap _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com