Re: SIGSEGV in cephfs-java, but probably in Ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday, June 4, 2012 at 1:47 PM, Noah Watkins wrote:
> On Mon, Jun 4, 2012 at 1:17 PM, Greg Farnum <greg@xxxxxxxxxxx (mailto:greg@xxxxxxxxxxx)> wrote:
> 
> > I'm not quite sure what you mean here. Ceph is definitely using pthread threading and mutexes, but I don't see how the use of a different threading library can break pthread mutexes (which are just using the kernel futex stuff, AFAIK).
> > But I admit I'm not real good at handling those sorts of interactions, so maybe I'm missing something?
> 
> 
> 
> The basic idea was that threads in Java did not map 1:1 with kernel
> threads (think co-routines), which would break a lot of stuff,
> especially futex. Looking at some documentation, old JVMs had
> something called Green Threads, but have now been abandoned in favor
> of native threads. So maybe this theory is now irrelevant, and
> evidence seems to suggest you're right and Java is using native
> threads.

Gotcha, that makes sense.
 
> 
> > > The RADOS Java wrappers suffered from an interaction between the JVM and RADOS client signal handlers, in which either the JVM or RADOS would replace the handlers for the other (not sure which order). Anyway, the solution was to link in the JVM libjsig.so signal chaining library. This might be the same thing we are seeing here, but I'm betting it is the first theory I mentioned.
> 
> > Hmm. I think that's an issue we've run into but I thought it got fixed for librados. Perhaps I'm mixing that up with libceph, or just pulling past scenarios out of thin air. It never manifested as Mutex count bugs, though!
> 
> I haven't tested the Rados wrappers in a while. I've never had to link
> in the signal chaining library for libcephfs.
> 
> I wonder if the Mutex::lock(bool) being printed out is a red herring... 
Well, it's a SIGSEGV. So my guess is that's the frame that happens to be going outside its allowed bounds, probably because it's the first frame actually accessing the memory off of a bad (probably NULL) pointer. For instance, if it not only failed to mount the client, but even to create the context object? 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux