RBD hanging on some volumes of a pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am facing issues with some of my rbd volumes since yesterday. Some of them completely hang at some point before eventually resuming IO, may it be a few minutes or several hours later.

First and foremost, my setup : I already detailed it on the mailing list [0][1]. Some changes have been made : the 3 monitors are now VM and we are trying kernel 4.4.5 on the clients (cluster is still 3.10 centos7).

Using EC pools, I already had some trouble with RBD features not supported by EC [2] and changed min_recency_* to 0 about 2 weeks ago to avoid the hassle. Everything has been working pretty smoothly since.

All my volumes (currently 5) are on an EC pool with writeback cache. Two of them are perfectly fine. On the other 3, different story : doing IO is impossible, if I start a simple copy I get a new file of a few dozen MB (or sometimes 0) then it hangs. Doing dd with direct and sync flags has the same behaviour.

I tried witching back to 3.10, no changes, on the client I rebooted I currently cannot mount the filesystem, mount hangs (the volume seems correctly mapped however).

strace on the cp command freezes in the middle of a read :

11:17:56 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536) = 65536
11:17:56 read(3,


I tried to bump up the logging but I don't really know what to look for exactly and didn't see anything obvious.

Any input or lead on how to debug this would be highly appreciated :)

Adrien

[0] http://www.spinics.net/lists/ceph-users/msg23990.html
[1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-January/007004.html
[2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007746.html


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux