Le 27/09/2015 09:15, Lionel Bouton a écrit : > Hi, > > we just had a quasi simultaneous crash on two different OSD which > blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9. > > the first OSD to go down had this error : > > 2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function > 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, > size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27 > 06:30:33.145251 > os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio > || got != -5) > > the second OSD crash was similar : > > 2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function > 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, > size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27 > 06:30:57.260978 > os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio > || got != -5) > > I'm familiar with this error : it happened already with a BTRFS read > error (invalid csum) and I could correct it after flush-journal/deleting > the corrupted file/starting OSD/pg repair. > This time though there isn't any kernel log indicating an invalid csum. > The kernel is different though : we use 3.18.9 on these two servers and > the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors > with this version. I've launched btrfs scrub on the 2 filesystems just > in case (still waiting for completion). > > The first attempt to restart these OSDs failed: one OSD died 19 seconds > after start, the other 21 seconds. Seeing that, I temporarily brought > down the min_size to 1 which allowed the 9 incomplete PG to recover. I > verified this by bringing min_size again to 2 and then restarted the 2 > OSDs. They didn't crash yet. > > For reference the assert failures were still the same when the OSD died > shortly after start : > 2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function > 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, > size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27 > 08:20:19.325126 > os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio > || got != -5) > > 2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function > 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t, > size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27 > 08:20:50.605234 > os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio > || got != -5) > > Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving > one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with > currently a 10 minute interval), this might be relevant (or just a > coincidence). > > I made copies of the ceph osd logs (including the stack trace and the > recent events) if needed. > > Can anyone put some light on why these OSDs died ? I just had a thought. Could launching a defragmentation on a file in a BTRFS OSD filestore trigger this problem? We have a process doing just that. It waits until there's no recent access to queue files for defragmentation but there's no guarantee that it will not defragment a file an OSD is about to use. This might explain the nearly simultaneous crash as the defragmentation is triggered by write access patterns which should be the roughly the same on all 3 OSDs hosting a copy of the file. The defragmentation isn't running at the exact same time because it is queued which could explain why we got 2 crashes instead of 3. I'll probably ask on linux-btrfs but the possible conditions leading to this assert failure would help pinpoint the problem, so if someone knows this code well enough without knowing how BTRFS behaves while defragmenting I'll bridge the gap. I just activated autodefrag on one of the two affected servers for all its BTRFS filesystems and disabled our own defragmentation process. With recent tunings we might not need our own defragmentation scheduler anymore and we can afford to lose some performance while investigating this. Best regards, Lionel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com