Re: ceph hang on pg list_unfound

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Restart osd.1 with debugging enabled

debug osd = 20
debug filestore = 20
debug ms = 1

Then, run list_unfound once the pg is back in active+recovering.  If
it still hangs, post osd.1's log to the list along with the output of
ceph osd dump and ceph pg dump.
-Sam

On Wed, May 18, 2016 at 6:20 PM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:
> I am running 10.2.0-0ubuntu0.16.04.1.
> I've run into a problem w/ cephfs metadata pool. Specifically I have a pg w/
> an 'unfound' object.
>
> But i can't figure out which since when i run:
> ceph pg 12.94 list_unfound
>
> it hangs (as does ceph pg 12.94 query). I know its in the cephfs metadata
> pool since I run:
> ceph pg ls-by-pool cephfs_metadata |egrep "pg_stat|12\\.94"
>
> and it shows it there:
> pg_stat objects mip     degr    misp    unf     bytes   log     disklog
> state   state_stamp     v       reported        up      up_primary
> acting  acting_primary  last_scrub      scrub_stamp     last_deep_scrub
> deep_scrub_stamp
> 12.94   231     1       1       0       1       90      3092    3092
> active+recovering+degraded      2016-05-18 23:49:15.718772      8957'386130
> 9472:367098     [1,4]   1       [1,4]   1       8935'385144     2016-05-18
> 10:46:46.123526     8337'379527     2016-05-14 22:37:05.974367
>
> OK, so what is hanging, and how can i get it to unhang so i can run a
> 'mark_unfound_lost' on it?
>
> pg 12.94 is on osd.0
>
> ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.48996 root default
> -2 0.89999     host nubo-1
>  0 0.89999         osd.0         up  1.00000          1.00000
> -3 0.89999     host nubo-2
>  1 0.89999         osd.1         up  1.00000          1.00000
> -4 0.89999     host nubo-3
>  2 0.89999         osd.2         up  1.00000          1.00000
> -5 0.92999     host nubo-19
>  3 0.92999         osd.3         up  1.00000          1.00000
> -6 0.92999     host nubo-20
>  4 0.92999         osd.4         up  1.00000          1.00000
> -7 0.92999     host nubo-21
>  5 0.92999         osd.5         up  1.00000          1.00000
>
> I cranked the logging on osd.0. I see a lot of messages, but nothing
> interesting.
>
> I've double checked all nodes can ping each other. I've run 'xfs_repair' on
> the underlying xfs storage to check for issues (there were none).
>
> Can anyone suggest how to uncrack this hang so i can try and repair this
> system?
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux