Re: RBD hard crash on kernel 3.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 09 Apr 2015 02:33:44 +0000 Shawn Edwards wrote:

> On Wed, Apr 8, 2015 at 9:23 PM Christian Balzer <chibi@xxxxxxx> wrote:
> 
> > On Wed, 08 Apr 2015 14:25:36 +0000 Shawn Edwards wrote:
> >
> > > We've been working on a storage repository for xenserver 6.5, which
> > > uses the 3.10 kernel (ug).  I got the xenserver guys to include the
> > > rbd and libceph kernel modules into the 6.5 release, so that's at
> > > least available.
> > >
> > Woah, hold on for a moment here.
> >
> > AFAIK no version of XenServer supports Ceph/RBD, how are you planning
> > to integrate that?
> >
> > Another group in my company uses XenServer and I was hoping to supply
> > them with cheap(er) storage than what they have now, but the local
> > Citrix reseller (we have a huge spendy license) wasn't helpful at all
> > when it came to get a feature request for upcoming versions into the
> > pipeline.
> >
> > And anything with a NFS or iSCSI head on top of Ceph is another can of
> > worms in terms of complexity and reliability, never mind that cost and
> > performance will also be impacted by such a kludge.
> >
> > So is this some work in progress then?
> >
> 
> Yes, this is work being done at my employer's.  We've been looking at
> this as a possibility for a while, and finally had the resources to
> dedicate to it.
> 
That's very welcome news, anywhere I should look for updates on that,
other than any missives by you on this ML?

> 
> >
> > > Where things go bad is when we have many (>10 or so) VMs on one
> > > host, all using RBD clones for the storage mapped using the rbd
> > > kernel module.  The Xenserver crashes so badly that it doesn't even
> > > get a chance to kernel panic.  The whole box just hangs.
> > >
> > > Has anyone else seen this sort of behavior?
> > >
> > I think that's very much expected with that kernel, you'll probably
> > want 3.18 when using the kernel module, as that is the first to
> > support TRIM as well.
> > Also the host you're mapping things to isn't also a Ceph OSD node,
> > right?
> >
> >
> Correct.  The Xenservers and Ceph OSD are completely separate machines.
> 
So that common cause for Ceph fireworks is ruled out then.
 
> 
> > > We have a lot of ways to try to work around this, but none of them
> > > are very pretty:
> > >
> > > * move the code to user space, ditch the kernel driver:  The build
> > > tools for Xenserver are all CentOS5 based, and it is painful to get
> > > all of the deps built to get the ceph user space libs built.
> > >
> > I feel your pain, however experience tells that the user space is
> > better and most importantly faster maintained when it comes to vital
> > features.
> >
> > This is the option I'd go for, everything else being equal.
> >
> >
> Yeah, agreed.  This is the best option.  Grumble.  The recent source for
> Ceph really doesn't like to play well with the older build tools, and the
> dependency list is staggering.
> 
I can only begin to imagine that mess, one would have hoped for something
a little more modern on the XenServer side given it was just released this
year.

> 
> > > * backport the ceph and rbd kernel modules to 3.10.  Has proven
> > > painful, as the block device code changed somewhere in the 3.14-3.16
> > > timeframe.
> > >
> > Madness seems to lie down that path.
> >
> >
> Agreed.
> 
Never mind that only starting with 3.14 Ceph can map more than 250 RBD
devices using the kernel module. 
Which was a crucial milestone that happened just in time for a ganeti
cluster I deployed when that kernel became available.

> 
> > > * forward-port the xen kernel patches from 3.10 to a newer driver
> > > (3.18 preferred) and run that on xenserver.  Painful for the same
> > > reasons as above, but in the opposite direction.
> > >
> > Probably the better of the 2 choices, as you gain many other
> > improvements as well. Including support of newer hardware. ^o^
> >
> >
> Agreed as well.  At this point, it may be the 'best' option, as it would
> require the least rewrite of our most current bits.  Although it would
> make maintaining Xen painful, since we'd need to be extra careful with
> vendor patches.
> 
Yeah, there's certainly no way this won't involve a lot of heavy lifting
while being very careful on the other side.


> 
> > Regards,
> >
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> >


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux