Re: rbd map command hangs for 15 minutes during system start up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've added the output of "ps -ef" in addition to triggering a trace
when a hang is detected.  Not much is generally running at that point,
but you can have a look:

https://gist.github.com/raw/4330223/2f131ee312ee43cb3d8c307a9bf2f454a7edfe57/rbd-hang-1355851498.txt

Is it possible that there is some sort of deadlock going on?  We are
doing the rbd maps (and subsequent filesystem mounts) on the same
systems which are running the ceph-osd and ceph-mon processes.  To get
around the 'sync' deadlock problem, we are using a patch from Sage
which ignores system wide sync's on filesystems mounted with the
'mand' option (and we mount the underlying osd filesystems with
'mand').  However I am wondering if there is potential for other types
of deadlocks in this environment.

Also, we recently saw an rbd hang in a much older version, running
kernel 3.5.3 with only the sync hack patch, along side ceph 0.48.1.
It's possible that this issue was around for some time, just the
recent patches made it happen more often (and thus more reproducible)
for us.


On Tue, Dec 18, 2012 at 8:09 AM, Alex Elder <elder@xxxxxxxxxxx> wrote:
> On 12/17/2012 11:12 AM, Nick Bartos wrote:
>> Here's a log with the rbd debugging enabled:
>>
>> https://gist.github.com/raw/4319962/d9690fd92c169198efc5eecabf275ef1808929d2/rbd-hang-test-1355763470.log
>>
>> On Fri, Dec 14, 2012 at 10:03 AM, Alex Elder <elder@xxxxxxxxxxx> wrote:
>>> On 12/14/2012 10:53 AM, Nick Bartos wrote:
>>>> Yes I was only enabling debugging for libceph.  I'm adding debugging
>>>> for rbd as well.  I'll do a repro later today when a test cluster
>>>> opens up.
>>>
>>> Excellent, thank you.   -Alex
>
> I looked through these debugging messages.  Looking only at the
> rbd debugging, what I see seems to indicate that rbd is idle at
> the point the "hang" seems to start.  This suggests that the hang
> is not due to rbd itself, but rather whatever it is that might
> be responsible for using the rbd image once it has been mapped.
>
> Is that possible?  I don't know what process you have that is
> mapping the rbd image, and what is supposed to be the next thing
> it does.  (I realize this may not make a lot of sense, given
> a patch in rdb seems to have caused the hang to begin occurring.)
>
> Also note that the debugging information available (i.e., the
> lines in the code that can output debugging information) may
> well be incomplete.  So if you don't find anything it may be
> necessary to provide you with another update which might include
> more debugging.
>
> Anyway, could you provide a little more context about what
> is going on sort of *around* rbd when activity seems to stop?
>
> Thanks a lot.
>
>                                         -Alex
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux