Hi Nathan
- We build a ceph cluster with 3 nodes.
node-3: osd-2, mon-b,node-4: osd-0, mon-a, mds-myfs-a, mgrnode-5: osd-1, mon-c, mds-myfs-bceph cluster created by rook.
- Test phenomenon
After one node unusual down(like direct poweroff), try to mount cephfs volume will spend more than 40 seconds.
- Normally Ceph Cluster Status:
$ ceph status
cluster:id: 776b5432-be9c-455f-bb2e-05cbf20d6f6ahealth: HEALTH_OKservices:mon: 3 daemons, quorum a,b,c (age 20h)mgr: a(active, since 21h)mds: myfs:1 {0=myfs-a=up:active} 1 up:standbyosd: 3 osds: 3 up (since 20h), 3 in (since 21h)data:pools: 2 pools, 136 pgsobjects: 2.59k objects, 330 MiBusage: 25 GiB used, 125 GiB / 150 GiB availpgs: 136 active+cleanio:client: 1.5 KiB/s wr, 0 op/s rd, 0 op/s wr
- Normally CephFS Status:
$ ceph fs status
myfs - 3 clients====+------+--------+--------+---------------+-------+-------+| Rank | State | MDS | Activity | dns | inos |+------+--------+--------+---------------+-------+-------+| 0 | active | myfs-a | Reqs: 0 /s | 2250 | 2059 |+------+--------+--------+---------------+-------+-------++---------------+----------+-------+-------+| Pool | type | used | avail |+---------------+----------+-------+-------+| myfs-metadata | metadata | 208M | 39.1G || myfs-data0 | data | 121M | 39.1G |+---------------+----------+-------+-------++-------------+| Standby MDS |+-------------+| myfs-b |+-------------+MDS version: ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
- Are you using replica or EC?
=> Not used EC
- 'min_size' is not smaller than 'size'?
$ ceph osd dump | grep poolpool 1 'myfs-metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 16 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfspool 2 'myfs-data0' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 141 lfor 0/0/53 flags hashpspool stripe_width 0 application cephfs
- What is your crush map?
$ ceph osd crush dump{"devices": [{"id": 0,"name": "osd.0","class": "hdd"},{"id": 1,"name": "osd.1","class": "hdd"},{"id": 2,"name": "osd.2","class": "hdd"}],"types": [{"type_id": 0,"name": "osd"},{"type_id": 1,"name": "host"},{"type_id": 2,"name": "chassis"},{"type_id": 3,"name": "rack"},{"type_id": 4,"name": "row"},{"type_id": 5,"name": "pdu"},{"type_id": 6,"name": "pod"},{"type_id": 7,"name": "room"},{"type_id": 8,"name": "datacenter"},{"type_id": 9,"name": "zone"},{"type_id": 10,"name": "region"},{"type_id": 11,"name": "root"}],"buckets": [{"id": -1,"name": "default","type_id": 11,"type_name": "root","weight": 9594,"alg": "straw2","hash": "rjenkins1","items": [{"id": -3,"weight": 3198,"pos": 0},{"id": -5,"weight": 3198,"pos": 1},{"id": -7,"weight": 3198,"pos": 2}]},{"id": -2,"name": "default~hdd","type_id": 11,"type_name": "root","weight": 9594,"alg": "straw2","hash": "rjenkins1","items": [{"id": -4,"weight": 3198,"pos": 0},{"id": -6,"weight": 3198,"pos": 1},{"id": -8,"weight": 3198,"pos": 2}]},{"id": -3,"name": "node-4","type_id": 1,"type_name": "host","weight": 3198,"alg": "straw2","hash": "rjenkins1","items": [{"id": 0,"weight": 3198,"pos": 0}]},{"id": -4,"name": "node-4~hdd","type_id": 1,"type_name": "host","weight": 3198,"alg": "straw2","hash": "rjenkins1","items": [{"id": 0,"weight": 3198,"pos": 0}]},{"id": -5,"name": "node-5","type_id": 1,"type_name": "host","weight": 3198,"alg": "straw2","hash": "rjenkins1","items": [{"id": 1,"weight": 3198,"pos": 0}]},{"id": -6,"name": "node-5~hdd","type_id": 1,"type_name": "host","weight": 3198,"alg": "straw2","hash": "rjenkins1","items": [{"id": 1,"weight": 3198,"pos": 0}]},{"id": -7,"name": "node-3","type_id": 1,"type_name": "host","weight": 3198,"alg": "straw2","hash": "rjenkins1","items": [{"id": 2,"weight": 3198,"pos": 0}]},{"id": -8,"name": "node-3~hdd","type_id": 1,"type_name": "host","weight": 3198,"alg": "straw2","hash": "rjenkins1","items": [{"id": 2,"weight": 3198,"pos": 0}]}],"rules": [{"rule_id": 0,"rule_name": "replicated_rule","ruleset": 0,"type": 1,"min_size": 1,"max_size": 10,"steps": [{"op": "take","item": -1,"item_name": "default"},{"op": "chooseleaf_firstn","num": 0,"type": "host"},{"op": "emit"}]},{"rule_id": 1,"rule_name": "myfs-metadata","ruleset": 1,"type": 1,"min_size": 1,"max_size": 10,"steps": [{"op": "take","item": -1,"item_name": "default"},{"op": "chooseleaf_firstn","num": 0,"type": "host"},{"op": "emit"}]},{"rule_id": 2,"rule_name": "myfs-data0","ruleset": 2,"type": 1,"min_size": 1,"max_size": 10,"steps": [{"op": "take","item": -1,"item_name": "default"},{"op": "chooseleaf_firstn","num": 0,"type": "host"},{"op": "emit"}]}],"tunables": {"choose_local_tries": 0,"choose_local_fallback_tries": 0,"choose_total_tries": 50,"chooseleaf_descend_once": 1,"chooseleaf_vary_r": 1,"chooseleaf_stable": 1,"straw_calc_version": 1,"allowed_bucket_algs": 54,"profile": "jewel","optimal_tunables": 1,"legacy_tunables": 0,"minimum_required_version": "jewel","require_feature_tunables": 1,"require_feature_tunables2": 1,"has_v2_rules": 0,"require_feature_tunables3": 1,"has_v3_rules": 0,"has_v4_buckets": 1,"require_feature_tunables5": 1,"has_v5_rules": 0},"choose_args": {}}
- Question
How can i mount CephFS volumn as soon as possible, after one node unusual down.Any ceph cluster(filesystem) configuration suggestion? Using EC?
Best Regards
hfx@xxxxxxxxxx
Hi NathanIs that true?The time it takes to reallocate the primary pg delivers “downtime” by design. right? Seen from a writing clients perspectiveJesper
Sent from myMail for iOS
Friday, 29 November 2019, 06.24 +0100 from pengbo@xxxxxxxxxxx <pengbo@xxxxxxxxxxx>:Hi Nathan,Thanks for the help.My colleague will provide more details.BROn Fri, Nov 29, 2019 at 12:57 PM Nathan Fish <lordcirth@xxxxxxxxx> wrote:If correctly configured, your cluster should have zero downtime from a
single OSD or node failure. What is your crush map? Are you using
replica or EC? If your 'min_size' is not smaller than 'size', then you
will lose availability.
On Thu, Nov 28, 2019 at 10:50 PM Peng Bo <pengbo@xxxxxxxxxxx> wrote:
>
> Hi all,
>
> We are working on use CEPH to build our HA system, the purpose is the system should always provide service even a node of CEPH is down or OSD is lost.
>
> Currently, as we practiced once a node/OSD is down, the CEPH cluster needs to take about 40 seconds to sync data, our system can't provide service during that.
>
> My questions:
>
> Does there have any way that we can reduce the data sync time?
> How can we let the CEPH keeps available once a node/OSD is down?
>
>
> BR
>
> --
> The modern Unified Communications provider
>
> https://www.portsip.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--The modern Unified Communications provider_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com