Misdirected clients due to kernel bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everyone,

Last week, while deploying new disks in our cluster, we bump into what
we believe is a kernel bug. Now everything is working fine, though we
wanted to share our experience and see if other people have experienced
similar behaviour.

Steps we followed were:

1) First we removed DNE osds (that had been previously removed
from the cluster) to reuse their ids.

ceph osd crush remove osd.6
ceph auth del osd.6
ceph osd rm 6

2) Then we deployed new disks with ceph-deploy

ceph-deploy --overwrite-conf osd create ds1-ceph01:sda

We have two different pools on the cluster, hence we used the option

osd crush update on start = false

So we could later manually add OSDs to the desired pool with

ceph osd crush add osd.6 0.9 host=ds1-ceph01


We added two disks. First one looked fine, however, after adding the
second disk ceph -s started to show odd info such as some PGS on
backfill_toofull. The odd thing was that, the OSD supposed to be full
was 81% full, and the ratios are full_ratio 0.95 nearfull_ratio 0.88.

Also, monitor logs were getting flooded with messages like:

misdirected client.708156.1:1609543462 pg 2.1eff89a7 to osd.83 not
[1,83,93] in e154784/154784

On the clients we got write errors:

[20882274.721623] rbd: rbd28: result -6 xferred 2000
[20882274.773296] rbd: rbd28: write 2000 at aef404000 (4000)
[20882274.773304] rbd: rbd28: result -6 xferred 2000
[20882274.826057] rbd: rbd28: write 2000 at aef404000 (4000)
[20882274.826064] rbd: rbd28: result -6 xferred 2000

On OSDs, most of them were running:
ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
and few of them (including the new ones) with:
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)

On the clients, we were running kernel 4.1.1.

Once we rebooted clients with kernel 4.1.13 errors disappeared.

The misdirect messages made us think that there were incorrect/outdated
copies of the cluster map.

Any insights would be very welcome.

Regards,
Simon Engelsman

Greenhost - sustainable hosting & digital security
https://greenhost.nl
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux