Re: after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 14, 2021 at 10:47 AM Andrius Jurkus
<andrius.jurkus@xxxxxxxxxx> wrote:
>
> Hello, I will try to keep it sad and short :) :(    PS sorry if this
> dublicate I tried post it from web also.
>
> Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds.
> After data migration for few hours, 1 SSD failed, then another and
> another 1 by 1. Now I have cluster in pause and 5 failed SSD's, same
> host has SSD and HDD, but only SSD's are failing so I think this has to
> be balancing refiling or something bug and probably not upgrade bug.
>
> Cluster has been in pause for 4 hours and no more OSD's are failing.
>
> full trace
> https://pastebin.com/UxbfFYpb

This looks very similar to https://tracker.ceph.com/issues/50656.
Adding Igor for more ideas.

Neha

>
> Now I m googling and learning but, Is there a way how to easily test
> lets say 15.2.XX version on osd without losing anything?
>
> Any help would be appreciated.
>
> Error start like this
>
> May 14 16:58:52 dragon-ball-radar systemd[1]: Starting Ceph osd.2 for
> 4e01640b-951b-4f75-8dca-0bad4faf1b11...
> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14
> 16:58:53.057836433 +0000 UTC m=+0.454352919 container create
> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate,
> GIT_BRANCH=HEAD, maintainer=D
> May 14 16:58:53 dragon-ball-radar systemd[1]: Started libcrun container.
> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14
> 16:58:53.3394116 +0000 UTC m=+0.735928098 container init
> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate,
> maintainer=Dimitri Savineau <dsav
> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14
> 16:58:53.446921192 +0000 UTC m=+0.843437626 container start
> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate,
> GIT_BRANCH=HEAD, org.label-sch
> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14
> 16:58:53.447050119 +0000 UTC m=+0.843566553 container attach
> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate,
> org.label-schema.name=CentOS
> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command:
> /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command:
> /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
> /dev/ceph-45e6ef2e-fbdc-4289-a900-3d1ffc81ee14/osd-block-973cfe73-06c8-4ea0-9aea-1361d063eb25
> --path /var/lib/ceph/osd/ceph-2 --no-mon-config
> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command:
> /usr/bin/ln -snf
> /dev/ceph-45e6ef2e-fbdc-4289-a900-3d1ffc81ee14/osd-block-973cfe73-06c8-4ea0-9aea-1361d063eb25
> /var/lib/ceph/osd/ceph-2/block
> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command:
> /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command:
> /usr/bin/chown -R ceph:ceph /dev/dm-1
> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command:
> /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
> May 14 16:58:53 dragon-ball-radar bash[113558]: --> ceph-volume lvm
> activate successful for osd ID: 2
> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14
> 16:58:53.8147653 +0000 UTC m=+1.211281741 container died
> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate)
> May 14 16:58:55 dragon-ball-radar podman[113650]: 2021-05-14
> 16:58:55.044964534 +0000 UTC m=+2.441480996 container remove
> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate,
> CEPH_POINT_RELEASE=-16.2.4, R
> May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14
> 16:58:55.594265612 +0000 UTC m=+0.369978347 container create
> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, RELEASE=HEAD,
> org.label-schema.build-d
> May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14
> 16:58:55.864589286 +0000 UTC m=+0.640302021 container init
> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2,
> org.label-schema.schema-version=1.0, GIT
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 set uid:gid to 167:167
> (ceph:ceph)
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 ceph version 16.2.4
> (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable), process
> ceph-osd, pid 2
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 pidfile_write: ignore empty
> --pid-file
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800
> /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800
> /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000,
> 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1
> bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size
> 3221225472 meta 0.45 kv 0.45 data 0.06
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00
> /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00
> /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000,
> 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bluefs add_block_device bdev
> 1 path /var/lib/ceph/osd/ceph-2/block size 466 GiB
> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00
> /var/lib/ceph/osd/ceph-2/block) close
> May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14
> 16:58:55.972267166 +0000 UTC m=+0.747979911 container start
> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da
> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949,
> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, ceph=True,
> GIT_REPO=https://github.com/
> May 14 16:58:55 dragon-ball-radar bash[113558]:
> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da
> May 14 16:58:55 dragon-ball-radar systemd[1]: Started Ceph osd.2 for
> 4e01640b-951b-4f75-8dca-0bad4faf1b11.
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.184+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800
> /var/lib/ceph/osd/ceph-2/block) close
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 1 objectstore numa_node 0
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 0 starting osd.2 osd_data
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 -1 unable to find any IPv4
> address in networks '10.0.199.0/24' interfaces ''
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 -1 unable to find any IPv4
> address in networks '172.16.199.0/24' interfaces ''
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.452+0000 7fcf16aa2080 0 load: jerasure load: lrc
> load: isa
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400
> /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400
> /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000,
> 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1
> bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size
> 3221225472 meta 0.45 kv 0.45 data 0.06
> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug
> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400
> /var/lib/ceph/osd/ceph-2/block) close
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux