> Date: Thu, 30 Jul 2015 13:11:11 +0300 > Subject: Re: [ceph-users] which kernel version can help avoid kernel client deadlock > From: idryomov@xxxxxxxxx > To: zhangz.david@xxxxxxxxxxx > CC: chaofanyu@xxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > > On Thu, Jul 30, 2015 at 12:46 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote: > > > >> Date: Thu, 30 Jul 2015 11:37:37 +0300 > >> Subject: Re: [ceph-users] which kernel version can help avoid kernel > >> client deadlock > >> From: idryomov@xxxxxxxxx > >> To: zhangz.david@xxxxxxxxxxx > >> CC: chaofanyu@xxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > >> > >> On Thu, Jul 30, 2015 at 10:29 AM, Z Zhang <zhangz.david@xxxxxxxxxxx> > >> wrote: > >> > > >> > ________________________________ > >> > Subject: Re: [ceph-users] which kernel version can help avoid kernel > >> > client > >> > deadlock > >> > From: chaofanyu@xxxxxxxxxxx > >> > Date: Thu, 30 Jul 2015 13:16:16 +0800 > >> > CC: idryomov@xxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > >> > To: zhangz.david@xxxxxxxxxxx > >> > > >> > > >> > On Jul 30, 2015, at 12:48 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote: > >> > > >> > We also hit the similar issue from time to time on centos with 3.10.x > >> > kernel. By iostat, we can see kernel rbd client's util is 100%, but no > >> > r/w > >> > io, and we can't umount/unmap this rbd client. After restarting OSDs, it > >> > will become normal. > >> > >> 3.10.x is rather vague, what is the exact version you saw this on? Can you > >> provide syslog logs (I'm interested in dmesg)? > > > > The kernel version should be 3.10.0. > > > > I don't have sys logs at hand. It is not easily reproduced, and it happened > > at very low memory situation. We are running DB instances over rbd as > > storage. DB instances will use lot of memory when running high concurrent > > rw, and after running for a long time, rbd might hit this problem, but not > > always. Enabling rbd log makes our system behave strange during our test. > > > > I back-ported one of your fixes: > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/rbd.c?id=5a60e87603c4c533492c515b7f62578189b03c9c > > > > So far test looks fine for few days, but still under observation. So want to > > know if there are some other fixes? > > I'd suggest following 3.10 stable series (currently at 3.10.84). The > fix you backported is crucial in low memory situations, so I wouldn't > be surprised if it alone fixed your problem. (It is not in 3.10.84, > I assume it'll show up in 3.10.85 - for now just apply your backport.) > cool, looking forward 3.10.85 to see what else would be brought in. Thanks. > Thanks, > > Ilya |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com