Re: Bluestore: v11.2.0 peering not happening when OSD is down

Muthusamy Muthiah <muthiah.muthusamy@xxxxxxxxx> · Tue, 24 Jan 2017 12:57:06 +0530

Hi Greg,
We use EC:4+1 on 5 node cluster in production deployments with filestore and it does recovery and peering when one OSD goes down. After few mins , other OSD from a node where the fault OSD exists will take over the PGs temporarily and all PGs goes to active + clean state . Cluster also does not goes down during this recovery process.

Only on bluestore we see cluster going to error state when one OSD is down.
We are still validating this and let you know additional findings.

Thanks,
Muthu 

On 21 January 2017 at 02:06, Shinobu Kinjo <skinjo@xxxxxxxxxx> wrote:
`ceph pg dump` should show you something like:

 * active+undersized+degraded ... [NONE,3,2,4,1]    3    [NONE,3,2,4,1]

Sam,

Am I wrong? Or is it up to something else?

On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

> I'm pretty sure the default configs won't let an EC PG go active with

> only "k" OSDs in its PG; it needs at least k+1 (or possibly more? Not

> certain). Running an "n+1" EC config is just not a good idea.

> For testing you could probably adjust this with the equivalent of

> min_size for EC pools, but I don't know the parameters off the top of

> my head.

> -Greg

>

> On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah

> <muthiah.muthusamy@xxxxxxxxx> wrote:

>> Hi ,

>>

>> We are validating kraken 11.2.0 with bluestore  on 5 node cluster with EC

>> 4+1.

>>

>> When an OSD is down , the peering is not happening and ceph health status

>> moved to ERR state after few mins. This was working in previous development

>> releases. Any additional configuration required in v11.2.0

>>

>> Following is our ceph configuration:

>>

>> mon_osd_down_out_interval = 30

>> mon_osd_report_timeout = 30

>> mon_osd_down_out_subtree_limit = host

>> mon_osd_reporter_subtree_level = host

>>

>> and the recovery parameters set to default.

>>

>> [root@ca-cn1 ceph]# ceph osd crush show-tunables

>>

>> {

>>     "choose_local_tries": 0,

>>     "choose_local_fallback_tries": 0,

>>     "choose_total_tries": 50,

>>     "chooseleaf_descend_once": 1,

>>     "chooseleaf_vary_r": 1,

>>     "chooseleaf_stable": 1,

>>     "straw_calc_version": 1,

>>     "allowed_bucket_algs": 54,

>>     "profile": "jewel",

>>     "optimal_tunables": 1,

>>     "legacy_tunables": 0,

>>     "minimum_required_version": "jewel",

>>     "require_feature_tunables": 1,

>>     "require_feature_tunables2": 1,

>>     "has_v2_rules": 1,

>>     "require_feature_tunables3": 1,

>>     "has_v3_rules": 0,

>>     "has_v4_buckets": 0,

>>     "require_feature_tunables5": 1,

>>     "has_v5_rules": 0

>> }

>>

>> ceph status:

>>

>>      health HEALTH_ERR

>>             173 pgs are stuck inactive for more than 300 seconds

>>             173 pgs incomplete

>>             173 pgs stuck inactive

>>             173 pgs stuck unclean

>>      monmap e2: 5 mons at

>> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}

>>             election epoch 106, quorum 0,1,2,3,4

>> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5

>>         mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, ca-cn3

>>      osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs

>>             flags sortbitwise,require_jewel_osds,require_kraken_osds

>>       pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 kobjects

>>             85199 GB used, 238 TB / 322 TB avail

>>                 1868 active+clean

>>                  173 remapped+incomplete

>>                    7 active+clean+scrubbing

>>

>> MON log:

>>

>> 2017-01-20 09:25:54.715684 7f55bcafb700  0 log_channel(cluster) log [INF] :

>> osd.54 out (down for 31.703786)

>> 2017-01-20 09:25:54.725688 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1120

>> crush map has features 288250512065953792, adjusting msgr requires

>> 2017-01-20 09:25:54.729019 7f55bf4d5700  0 log_channel(cluster) log [INF] :

>> osdmap e1120: 60 osds: 59 up, 59 in

>> 2017-01-20 09:25:54.735987 7f55bf4d5700  0 log_channel(cluster) log [INF] :

>> pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6

>> active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / 322 TB avail;

>> 21825 B/s rd, 163 MB/s wr, 2046 op/s

>> 2017-01-20 09:25:55.737749 7f55bf4d5700  0 mon.ca-cn1@0(leader).osd e1121

>> crush map has features 288250512065953792, adjusting msgr requires

>> 2017-01-20 09:25:55.744338 7f55bf4d5700  0 log_channel(cluster) log [INF] :

>> osdmap e1121: 60 osds: 59 up, 59 in

>> 2017-01-20 09:25:55.749616 7f55bf4d5700  0 log_channel(cluster) log [INF] :

>> pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 active+clean, 144

>> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB /

>> 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s

>> 2017-01-20 09:25:56.768721 7f55bf4d5700  0 log_channel(cluster) log [INF] :

>> pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 active+clean, 126

>> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB /

>> 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s

>>

>> Thanks,

>> Muthu

>>

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com