Re: 5 pgs inactive, 5 pgs incomplete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After doing some research I suspect the problem is that during the
cluster was backfilling an OSD was removed.

Now the PGs which are inactive and incomplete have all the same
(removed OSD) in the "down_osds_we_would_probe" output and the peering
is blocked by "peering_blocked_by_history_les_bound". We tried to set
the "osd_find_best_info_ignore_history_les = true" but with no success
the OSDs keep in a peering loop.

On Mon, Aug 17, 2020 at 9:53 AM Martin Palma <martin@xxxxxxxx> wrote:
>
> Here is the output with all OSD up and running.
>
> ceph -s: https://pastebin.com/5tMf12Lm
> ceph health detail: https://pastebin.com/avDhcJt0
> ceph osd tree: https://pastebin.com/XEB0eUbk
> ceph osd pool ls detail: https://pastebin.com/ShSdmM5a
>
> On Mon, Aug 17, 2020 at 9:38 AM Martin Palma <martin@xxxxxxxx> wrote:
> >
> > Hi Peter,
> >
> > On the weekend another host was down due to power problems, which was
> > restarted. Therefore these outputs also include some "Degraded data
> > redundancy" messages. And one OSD crashed due to a disk error.
> >
> > ceph -s: https://pastebin.com/Tm8QHp52
> > ceph health detail: https://pastebin.com/SrA7Bivj
> > ceph osd tree: https://pastebin.com/nBK8Uafd
> > ceph osd pool ls detail: https://pastebin.com/kYyCb7B2
> >
> > No it's not a EC pool which has the inactive+incomplete PGs.
> >
> > ceph osd crush dump | jq '[.rules, .tunables]': https://pastebin.com/gqDTjfat
> >
> > Best,
> > Martin
> >
> > On Sun, Aug 16, 2020 at 3:44 PM Peter Maloney
> > <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Dear Martin,
> > >
> > > Can you provide some details?
> > >
> > > ceph -s
> > > ceph health detail
> > > ceph osd tree
> > > ceph osd pool ls detail
> > >
> > > If it's EC (you implied it's not) also show the crush rules...and may as well include tunables (because greatly raising choose_total_tries, eg. 200 may be the solution to your problem):
> > > ceph osd crush dump | jq '[.rules, .tunables]'
> > >
> > > Peter
> > >
> > > On 8/16/20 1:18 AM, Martin Palma wrote:
> > > > Yes, but that didn’t help. After some time they have blocked requests again
> > > > and remain inactive and incomplete.
> > > >
> > > > On Sat, 15 Aug 2020 at 16:58, <ceph@xxxxxxxxxx> wrote:
> > > >
> > > >> Did you tried to restart the sayed osds?
> > > >>
> > > >>
> > > >>
> > > >> Hth
> > > >>
> > > >> Mehmet
> > > >>
> > > >>
> > > >>
> > > >> Am 12. August 2020 21:07:55 MESZ schrieb Martin Palma <martin@xxxxxxxx>:
> > > >>
> > > >>>> Are the OSDs online? Or do they refuse to boot?
> > > >>> Yes. They are up and running and not marked as down or out of the
> > > >>> cluster.
> > > >>>> Can you list the data with ceph-objectstore-tool on these OSDs?
> > > >>> If you mean the "list" operation on the PG works if an output for
> > > >>> example:
> > > >>> $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-63 --pgid
> > > >>> 22.11a --op list
> > > >>
> > > >>> ["22.11a",{"oid":"1001c1ee04f.00000007","key":"","snapid":-2,"hash":3825189146,"max":0,"pool":22,"namespace":"","max":0}]
> > > >>
> > > >>> ["22.11a",{"oid":"1000448667f.00000000","key":"","snapid":-2,"hash":4294951194,"max":0,"pool":22,"namespace":"","max":0}]
> > > >>> ...
> > > >>> If I run "ceph pg ls incomplete" in the output only one PG has
> > > >>> objects... all others have 0 objects.
> > > >>> _______________________________________________
> > > >>> ceph-users mailing list -- ceph-users@xxxxxxx
> > > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > >> _______________________________________________
> > > >>
> > > >> ceph-users mailing list -- ceph-users@xxxxxxx
> > > >>
> > > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > > >>
> > > >>
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > >
> > > --
> > > --------------------------------------------
> > > Peter Maloney
> > > Brockmann Consult GmbH
> > > www.brockmann-consult.de
> > > Chrysanderstr. 1
> > > D-21029 Hamburg, Germany
> > > Tel: +49 (0)40 69 63 89 - 320
> > > E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
> > > Amtsgericht Hamburg HRB 157689
> > > Geschäftsführer Dr. Carsten Brockmann
> > > --------------------------------------------
> > >
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux