Re: osd crash and high server load - ceph-osd crashes with stacktrace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Original Message -----
> From: "Jacek Jarosiewicz" <jjarosiewicz@xxxxxxxxxxxxx>
> To: ceph-users@xxxxxxxxxxxxxx
> Sent: Sunday, 25 October, 2015 8:48:59 PM
> Subject: Re:  osd crash and high server load - ceph-osd crashes with stacktrace
> 
> We've upgraded ceph to 0.94.4 and kernel to 3.16.0-51-generic
> but the problem still persists. Lately we see these crashes on a daily
> basis. I'm leaning toward the conclusion that this is a software problem
> - this hardware ran stable before and we're seeing all four nodes crash
> randomly with the same messages in log.. I'm thinking if this can be
> flashcache related.. nothing else comes to mind..
> 
> can anyone look at the logs and help some?
> 
> ceph-osd log: http://pastebin.com/AGGtvHr2
> kernel log: http://pastebin.com/jVSa8eme

I'd suggest you focus on why the kernel threads are going into d-state
(uninterruptible sleep) since that should probably be addressed first. Ceph is
a userspace application so should not be able to "hang" the kernel. XFS
filesystem code or the underlying storage appears to be implicated here but it
could be something else. The hung kernel threads are waiting for something,
need to work out what that is.

It is likely Ceph is just triggering this problem.

Cheers,
Brad

> 
> J
> 
> On 10/09/2015 09:15 AM, Jacek Jarosiewicz wrote:
> > Hi,
> >
> > We've noticed a problem with our cluster setup:
> >
> > 4 x OSD nodes:
> > E5-1630 CPU
> > 32 GB RAM
> > Mellanox MT27520 56Gbps network cards
> > SATA controller LSI Logic SAS3008
> > Storage nodes are connected to two SuperMicro chassis: 847E1C-R1K28JBOD
> > Each node has 2-3 spinning OSDs (6TB drives) and 2 ssd drives (240GB
> > Intel DC S3710 drives) for journal and cache
> > 3 monitors running on OSD nodes
> > ceph hammer 0.94.3
> > Ubuntu 14.04
> > standard replicated pools with size 2 (min_size 1)
> > 40GB journal per osd on SSD drives, 40GB flashcache per osd.
> >
> > Everything seems to work fine, but every few days or so one of the nodes
> > (not always the same node - different nodes each time) gets very high
> > load, becomes inaccessible and needs to be rebooted.
> >
> > After reboot we can start osd's and the cluster returns to HEALTH_OK
> > state pretty quickly.
> >
> > After looking into logfiles this seems to be related to ceph-osd
> > processes (links to the logs are at the bottom of this msg).
> >
> > The cluster is a test setup - not used in production and at the time the
> > ceph-osd processes crushes the cluster isn't doing anything.
> >
> > Any help would be appreciated.
> >
> > ceph-osd log: http://pastebin.com/AGGtvHr2
> > kernel log: http://pastebin.com/jVSa8eme
> >
> > J
> >
> 
> 
> --
> Jacek Jarosiewicz
> Administrator Systemów Informatycznych
> 
> ----------------------------------------------------------------------------------------
> SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
> ul. Senatorska 13/15, 00-075 Warszawa
> Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego
> Rejestru Sądowego,
> nr KRS 0000029537; kapitał zakładowy 42.756.000 zł
> NIP: 957-05-49-503
> Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
> 
> ----------------------------------------------------------------------------------------
> SUPERMEDIA ->   http://www.supermedia.pl
> dostep do internetu - hosting - kolokacja - lacza - telefonia
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux