Hi Ilya,
We just tried the 3.10.83 kernel with more rbd fixes back-ported from higher kernel version. At this time, we tried again to run rbd and 3 OSD deamons on the same node, but rbd IO will still hang and OSD filestore thread will time out to suicide when the memory becomes very low under high load. When this happened, enabling rbd log can even cause system unresponsive, so haven't collected some logs. Only osd log says filestore thread was doing filestore::_write before time-out, will look into it further. I know this is not an appropriate usage of CEPH and rbd, but I still want to ask if there is a way or workaround to do this? Is there any successful case in the community? Thanks. David Zhang From: zhangz.david@xxxxxxxxxxx To: idryomov@xxxxxxxxx Date: Fri, 31 Jul 2015 09:21:40 +0800 CC: ceph-users@xxxxxxxxxxxxxx; chaofanyu@xxxxxxxxxxx Subject: Re: [ceph-users] which kernel version can help avoid kernel client deadlock > Date: Thu, 30 Jul 2015 13:11:11 +0300 > Subject: Re: [ceph-users] which kernel version can help avoid kernel client deadlock > From: idryomov@xxxxxxxxx > To: zhangz.david@xxxxxxxxxxx > CC: chaofanyu@xxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > > On Thu, Jul 30, 2015 at 12:46 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote: > > > >> Date: Thu, 30 Jul 2015 11:37:37 +0300 > >> Subject: Re: [ceph-users] which kernel version can help avoid kernel > >> client deadlock > >> From: idryomov@xxxxxxxxx > >> To: zhangz.david@xxxxxxxxxxx > >> CC: chaofanyu@xxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > >> > >> On Thu, Jul 30, 2015 at 10:29 AM, Z Zhang <zhangz.david@xxxxxxxxxxx> > >> wrote: > >> > > >> > ________________________________ > >> > Subject: Re: [ceph-users] which kernel version can help avoid kernel > >> > client > >> > deadlock > >> > From: chaofanyu@xxxxxxxxxxx > >> > Date: Thu, 30 Jul 2015 13:16:16 +0800 > >> > CC: idryomov@xxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > >> > To: zhangz.david@xxxxxxxxxxx > >> > > >> > > >> > On Jul 30, 2015, at 12:48 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote: > >> > > >> > We also hit the similar issue from time to time on centos with 3.10.x > >> > kernel. By iostat, we can see kernel rbd client's util is 100%, but no > >> > r/w > >> > io, and we can't umount/unmap this rbd client. After restarting OSDs, it > >> > will become normal. > >> > >> 3.10.x is rather vague, what is the exact version you saw this on? Can you > >> provide syslog logs (I'm interested in dmesg)? > > > > The kernel version should be 3.10.0. > > > > I don't have sys logs at hand. It is not easily reproduced, and it happened > > at very low memory situation. We are running DB instances over rbd as > > storage. DB instances will use lot of memory when running high concurrent > > rw, and after running for a long time, rbd might hit this problem, but not > > always. Enabling rbd log makes our system behave strange during our test. > > > > I back-ported one of your fixes: > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/rbd.c?id=5a60e87603c4c533492c515b7f62578189b03c9c > > > > So far test looks fine for few days, but still under observation. So want to > > know if there are some other fixes? > > I'd suggest following 3.10 stable series (currently at 3.10.84). The > fix you backported is crucial in low memory situations, so I wouldn't > be surprised if it alone fixed your problem. (It is not in 3.10.84, > I assume it'll show up in 3.10.85 - for now just apply your backport.) > cool, looking forward 3.10.85 to see what else would be brought in. Thanks. > Thanks, > > Ilya _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com