Hello, I am experiencing an issue where OSD Services fail due to an unexpected aio error. This has happend on two different OSD servers killing two different OSD Daemons services. I am running Ceph Hammer on Debian Wheezy with a backported Kernel(3.16.0-0.bpo.4-amd64). Below is the log from one of the crashes. I am wondering if anyone else has experienced this issue and might be able to point out some troubleshooting steps? so far all i’ve found are similar issues on the ceph bug tracker. I have posted my case to that as well. 2015-08-16 08:11:54.227567 7f13d68de700 0 log_channel(cluster) log [WRN] : 3 slow requests, 3 included below; oldest blocked for > 30.685081 secs 2015-08-16 08:11:54.227579 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.685081 seconds old, received at 2015-08-16 08:11:23.542417: osd_op(client.1109461.0:219374023 rbd_data.10e67e79e2a9e3.000000000001c201 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2592768~4096] 5.89587894 ack+ondisk+write e1804) currently waiting for subops from 1,30 2015-08-16 08:11:54.227587 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.682262 seconds old, received at 2015-08-16 08:11:23.545236: osd_repop(client.1109461.0:219374083 5.c63 d6b85c63/rbd_data.10e67e79e2a9e3.000000000001a800/head//5 v 1804'121436) currently started 2015-08-16 08:11:54.227592 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.641702 seconds old, received at 2015-08-16 08:11:23.585797: osd_repop(client.1935041.0:1302764 5.82a 4219482a/rbd_data.1d685c2eb141f2.0000000000003c5f/head//5 v 1804'265055) currently started 2015-08-16 08:11:55.227784 7f13d68de700 0 log_channel(cluster) log [WRN] : 4 slow requests, 1 included below; oldest blocked for > 31.685317 secs 2015-08-16 08:11:55.227808 7f13d68de700 0 log_channel(cluster) log [WRN] : slow request 30.788521 seconds old, received at 2015-08-16 08:11:24.439213: osd_repop(client.1224667.0:34531998 5.abe 2f457abe/rbd_data.12aacc79e2a9e3.0000000000001d9d/head//5 v 1804'27936) currently started 2015-08-16 08:11:56.075649 7f13d3d89700 -1 journal aio to 7994220544~8192 wrote 18446744073709551611 2015-08-16 08:11:56.091460 7f13d3d89700 -1 os/FileJournal.cc: In function 'void FileJournal::write_finish_thread_entry()' thread 7f13d3d89700 time 2015-08-16 08:11:56.076462 os/FileJournal.cc: 1426: FAILED assert(0 == "unexpected aio error") ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) [0xcdb572] 2: (FileJournal::write_finish_thread_entry()+0x847) [0xb9a437] 3: (FileJournal::WriteFinisher::entry()+0xd) [0xa3befd] 4: (()+0x6b50) [0x7f13de90ab50] 5: (clone()+0x6d) [0x7f13dd32695d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Pontus Lindgren System Engineer _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com