Re: Help on ext4/xattr linux kernel stability issue / ceph xattr use?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2015-11-09 at 05:24 -0800, Sage Weil wrote:
> The above is all correct.  The mbcache (didn't know that existed!) is 
> definitely not going to be useful here.
> > Also I think it is necessary to warn ceph users to avoid ext4 at all
> > costs until this kernel/ceph issue is sorted out: we went from
> > relatively stable production for more than a year to crashes everywhere
> > all the time since two weeks ago, probably after hitting some magic
> > limit. We migrated our machines to ubuntu trusty, our SSD based
> > filesystem to XFS but our HDD are still mostly on ext4 (60 TB
> > of data to move so not that easy...).
> 
> Was there a ceph upgrade in there somewhere?  The size of the user.ceph._ 
> xattr has increased over time, and (somewhat) recently crossed the 255 
> byte threshold (on average) which also triggered a performance regression 
> on XFS...


Hi Sage,

Thanks for the confirmation.

The history of our cluster is:
- initial cluster on ceph 0.80.7 (september 2014)
debian ext4 since xfs and btrfs were crashing on debian/ceph 
- upgraded to 0.87 (december 2014)
- upgraded to 0.94.2 (june 2015)
- on october 26 2015 we got two disk failures in one night, we replaced
the disks but we started to have random machine freeze during
and after the recovery. We upgraded to 0.94.5 to be able to restart
two of our OSD due to:
http://tracker.ceph.com/issues/13594
- after changing various hardware part, adding new machine
we started to suspect ceph/ext4 so we migrated all
our machines to ubuntu trusty and all SSD to XFS leaving
60 TB of data on rotational ext4 (too long to migrate)

During the whole time cluster and data kept expanding
from 4 machines and 2 TB to 11 machines now and 60TB of data
(~ 75% full).

I have lightly tested a rebuild of the ubuntu trusty 3.19
kernel with the ext4 mbcache code removed, patch here:
https://bugzilla.kernel.org/show_bug.cgi?id=107301#c6

But now we have to decide wether to go live with it.

Sincerely,

Laurent

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux