Re: OSDs are crashing during PG replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mar 11, 2016 3:12 PM, "Alexander Gubanov" <shtnik@xxxxxxxxx> wrote:
>
> Sorry, I didn't have time to answer.
>
> >1st you said, 2 osds were crashed every time. From the log you pasted,
> >it makes sense to do something for osd.3.
>
> The problem is one PG 3.2. This PG is on osd.3 and osd.16 and this osds are both were crashed every time.
>
> >> rm -rf
> >> /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3
>
> >What makes me confused now is this.
> >Was osd.4 also crashed like osd.3?
>
> I thought that the problem is osd.13 or osd.16. I tried to disable these osds:
> # ceph osd crush reweight osd.3 0 
> # ceph osd crush reweight osd.16 0
> but when I did it 2 another osds were crashed and one of them is osd.4 and  the pg 3.2 was on osd.4.
>
> After this I decided to remove cache pool.
> Now I'm moving all data to new big ssd and so far all all right. 
>

Thanks for letting me know.
That is good to know.

I hope you are playing with the Ceph again!

> On Fri, Mar 4, 2016 at 10:44 AM, Shinobu Kinjo <shinobu.kj@xxxxxxxxx> wrote:
>>
>> Thank you for your explanation.
>>
>> > Every time 2 of 18 OSDs are crashing. I think it's happening when run PG replication because crashing only 2 OSDs and every time they're are the same.
>>
>> 1st you said, 2 osds were crashed every time. From the log you pasted,
>> it makes sense to do something for osd.3.
>>
>> > rm -rf
>> > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3
>>
>> What makes me confused now is this.
>> Was osd.4 also crashed like osd.3?
>>
>> >    -1> 2016-02-24 04:51:45.904673 7fd995026700  5 -- op tracker -- , seq: 19231, time: 2016-02-24 04:51:45.904673, event: started, request: osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get max 8388608] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone e13252) v4
>>
>> And crash seems to happen during this process, what I really want to
>> know is what this message inferred.
>> Did you check osd.13?
>>
>> Anyhow your cluster is now fine...no?
>> That's good news.
>>
>> Cheers,
>> Shinobu
>>
>> On Fri, Mar 4, 2016 at 11:05 AM, Alexander Gubanov <shtnik@xxxxxxxxx> wrote:
>> > I decided to refuse use of ssd cache pool and create just 2 pool. 1st pool
>> > only of ssd for fast storage 2nd only of hdd for slow storage.
>> > What about this file, honestly, I don't know why it is created. As I say I
>> > flush the journal for fallen OSD and remove this file and then I start osd
>> > damon:
>> >
>> > ceph-osd --flush-journal osd.3
>> > rm -rf
>> > /var/lib/ceph/osd/ceph-4/current/3.2_head/rb.0.19f2e.238e1f29.000000000728__head_813E90A3__3
>> > service ceph start osd.3
>> >
>> > But if I turn the cache pool off  the file isn't created:
>> >
>> > ceph osd tier cache-mode ${cahec_pool} forward
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Email:
>> shinobu@xxxxxxxxx
>> GitHub:
>> shinobu-x
>> Blog:
>> Life with Distributed Computational System based on OpenSource
>
>
>
>
> --
> Alexander Gubanov
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux