Hi Wido, On 18/08/2014 14:11, Wido den Hollander wrote:> On 08/18/2014 01:57 PM, Loic Dachary wrote: >> Hi Ceph, >> >> In RHEL 6.5, is the following scenario possible : >> >> a) an OSD dlopen a shared library for erasure-code, >> b) the shared library file is replaced while the OSD is running, >> c) the OSD starts using the new file instead of the old one. >> >> It seems unlikely but it would explain a weird stack trace at http://tracker.ceph.com/issues/9153#note-5 so I'm double checking ;-) >> > > Well, it could be that it does so. I'm not 100% sure, but afaik it could happen that when you replace a library certain parts might not be in memory. > > See: http://stackoverflow.com/questions/7767325/replacing-shared-object-so-file-while-main-program-is-running As it turns out, the problem is a simpler, but I still have not clue how it can happen. http://tracker.ceph.com/issues/9153 shows 537187718- ceph version 0.80.5-164-gcc4e625 (cc4e6258d67fb16d4a92c25078a0822a9849cd77) 537187795- 1: ceph-osd() [0x9b58c1] 537187821- 2: (()+0xf710) [0x7f06a3e24710] 537187854- 3: (memcpy()+0x15b) [0x7f06a2d4daab] 537187892- 4: (jerasure_matrix_dotprod()+0xc8) [0x7f067fd11618] 537187946- 5: (jerasure_matrix_encode()+0x75) [0x7f067fd11865] 537187999- 6: (ErasureCodeJerasureReedSolomonVandermonde::jerasure_encode(char**, char**, int)+0x21) [0x7f067fd294b1] 537188107- 7: (ErasureCodeJerasure::encode_chunks(std::set<int, std::less<int>, std::allocator<int> > const&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >*)+0x607) [0x7f067fd2a807] Meaning ceph-osd firefly crashed trying to use a jerasure plugin coming from master, which is no surprise because the API is incompatible although the data coding / encoding is compatible. Cheers >> Cheers >> > > -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature