My previous email did not go through because of its size. Here goes a
new attempt:
Cheers
Goncalo
--- * ---
Hi Patrick, Brad...
Unfortunately, the other user application breaks ceph-fuse again (It is
a completely different application then in my previous test).
We have tested it in 4 machines with 4 cores. The user is submitting 16
single core jobs which are all writing different output files (one per
job) to a common dir in cephfs. The first 4 jobs run happily and never
break ceph-fuse. But the remaining 12 jobs, running in the remaining 3
machines, trigger a segmentation fault, which is completely different
from the other case.
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x297fe2) [0x7f54402b7fe2]
2: (()+0xf7e0) [0x7f543ecf77e0]
3:
(ObjectCacher::bh_write_scattered(std::list<ObjectCacher::BufferHead*,
std::allocator<ObjectCacher::BufferHead*> >&)+0x36) [0x7f5440268086]
4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*,
std::chrono::time_point<ceph::time_detail::real_clock,
std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >,
long*, int*)+0x22c) [0x7f5440268a3c]
5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
8: (()+0x7aa1) [0x7f543ecefaa1]
9: (clone()+0x6d) [0x7f543df6893d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
The full log (with debug client = 20) for a segfault in client with IP
Y.Y.Y.255 is available here
https://dl.dropboxusercontent.com/u/2946024/nohup.out.2
(for privacy issues, I've substituted clients IPs for Y.Y.Y.(...) and
ceph infrastructure hosts ips for X.X.X.(...) )
Welll... further help is welcomed.
Cheers
Goncalo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com