Re: dropping filestore+btrfs testing for luminous

Lionel Bouton <lionel-subscription@xxxxxxxxxxx> · Tue, 4 Jul 2017 18:54:49 +0200

Le 30/06/2017 à 18:48, Sage Weil a écrit :
> On Fri, 30 Jun 2017, Lenz Grimmer wrote:
>> Hi Sage,
>>
>> On 06/30/2017 05:21 AM, Sage Weil wrote:
>>
>>> The easiest thing is to
>>>
>>> 1/ Stop testing filestore+btrfs for luminous onward.  We've recommended 
>>> against btrfs for a long time and are moving toward bluestore anyway.
>> Searching the documentation for "btrfs" does not really give a user any
>> clue that the use of Btrfs is discouraged.
>>
>> Where exactly has this been recommended?
>>
>> The documentation currently states:
>>
>> http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/?highlight=btrfs#osds
>>
>> "We recommend using the xfs file system or the btrfs file system when
>> running mkfs."
>>
>> http://docs.ceph.com/docs/master/rados/configuration/filesystem-recommendations/?highlight=btrfs#filesystems
>>
>> "btrfs is still supported and has a comparatively compelling set of
>> features, but be mindful of its stability and support status in your
>> Linux distribution."
>>
>> http://docs.ceph.com/docs/master/start/os-recommendations/?highlight=btrfs#ceph-dependencies
>>
>> "If you use the btrfs file system with Ceph, we recommend using a recent
>> Linux kernel (3.14 or later)."
>>
>> As an end user, none of these statements would really sound as
>> recommendations *against* using Btrfs to me.
>>
>> I'm therefore concerned about just disabling the tests related to
>> filestore on Btrfs while still including and shipping it. This has
>> potential to introduce regressions that won't get caught and fixed.
> Ah, crap.  This is what happens when devs don't read their own 
> documetnation.  I recommend against btrfs every time it ever comes up, the 
> downstream distributions all support only xfs, but yes, it looks like the 
> docs never got updated... despite the xfs focus being 5ish years old now.
>
> I'll submit a PR to clean this up, but
>  
>>> 2/ Leave btrfs in the mix for jewel, and manually tolerate and filter out 
>>> the occasional ENOSPC errors we see.  (They make the test runs noisy but 
>>> are pretty easy to identify.)
>>>
>>> If we don't stop testing filestore on btrfs now, I'm not sure when we 
>>> would ever be able to stop, and that's pretty clearly not sustainable.
>>> Does that seem reasonable?  (Pretty please?)
>> If you want to get rid of filestore on Btrfs, start a proper deprecation
>> process and inform users that support for it it's going to be removed in
>> the near future. The documentation must be updated accordingly and it
>> must be clearly emphasized in the release notes.
>>
>> Simply disabling the tests while keeping the code in the distribution is
>> setting up users who happen to be using Btrfs for failure.
> I don't think we can wait *another* cycle (year) to stop testing this.
>
> We can, however,
>
>  - prominently feature this in the luminous release notes, and
>  - require the 'enable experimental unrecoverable data corrupting features =
> btrfs' in order to use it, so that users are explicitly opting in to 
> luminous+btrfs territory.
>
> The only good(ish) news is that we aren't touching FileStore if we can 
> help it, so it less likely to regress than other things.  And we'll 
> continue testing filestore+btrfs on jewel for some time.
>
> Is that good enough?

Not sure how we will handle the transition. Is bluestore considered
stable in Jewel ? Then our current clusters (recently migrated from
Firefly to Hammer) will have support for both BTRFS+Filestore and
Bluestore when the next upgrade takes place. If Bluestore is only
considered stable on Luminous I don't see how we can manage the
transition easily. The only path I see is to :
- migrate to XFS+filestore with Jewel (which will not only take time but
will be a regression for us : this will cause performance and sizing
problems on at least one of our clusters and we will lose the silent
corruption detection from BTRFS)
- then upgrade to Luminous and migrate again to Bluestore.
I was not expecting the transition from Btrfs+Filestore to Bluestore to
be this convoluted (we were planning to add Bluestore OSDs one at a time
and study the performance/stability for months before migrating the
whole clusters). Is there any way to restrict your BTRFS tests to at
least a given stable configuration (BTRFS is known to have problems with
the high rate of snapshot deletion Ceph generates by default for example
and we use 'filestore btrfs snap = false') ?

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com