> Op 31 augustus 2016 om 22:14 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>: > > > On Tue, Aug 30, 2016 at 2:17 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote: > > Hello > > > > I've got a small cluster of 3 osd servers and 30 osds between them running > > Jewel 10.2.2 on Ubuntu 16.04 LTS with stock kernel version 4.4.0-34-generic. > > > > I am experiencing rather frequent osd crashes, which tend to happen a few > > times a month on random osds. The latest one gave me the following log > > message: > > > > > > 2016-08-30 06:26:29.861106 7f8ed54f1700 -1 journal aio to 13085011968~8192 > > wrote 18446744073709551615 > > 2016-08-30 06:26:29.862558 7f8ed54f1700 -1 os/filestore/FileJournal.cc: In > > function 'void FileJournal::write_finish_thread_entry()' thread 7f8ed54f1700 > > time 2016-08-30 06:26:29.86112 > > 2 > > os/filestore/FileJournal.cc: 1541: FAILED assert(0 == "unexpected aio > > error") > > As it says, the OSD got back an unexpected AIO error (and so it quit > rather than trying to continue on a possibly/probably flaky FS/disk). > Look at dmesg et al and see if there's anything useful; check your > disk info; etc. I have seen this happen on Dell systems with a PERC (Rebranded LSI) controller after a upgrade to Jewel. These systems were running Ubuntu 14.04 with the 3.13 kernel. After upgrading the kernel to at least 3.19 (newer is better) it went away. If that didn't work I set the queue_depth of the device to 1 in /sys. On Ubuntu 16.04 I didn't observe these crashes. Wido > -Greg > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com