Re: How to stop using (unmount) a failed OSD with BlueStore ?

Jamie Fargen <jfargen@xxxxxxxxxx> · Tue, 17 Oct 2017 19:14:59 -0400

Alejandro-
Those are kernel messages indicating that the an error was encountered when data was sent to the storage device and are not related directly to the operation of Ceph. The messages you sent also appear to have happened 4 days ago on Friday and if they have subsided then it probably means nothing further has tried to read/write to the disk, but the messages will be present in dmesg until the kernel ring buffer is overwritten or the system is restarted.

-Jamie

On Tue, Oct 17, 2017 at 6:47 PM, Alejandro Comisario <alejandro@xxxxxxxxxxx> wrote:
Jamie, thanks for replying, info is as follow:

1)

[Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 Sense Key : Medium Error [current]
[Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 Add. Sense: No additional sense information
[Fri Oct 13 10:21:24 2017] sd 0:2:23:0: [sdx] tag#0 CDB: Read(10) 28 00 00 00 09 10 00 00 f0 00
[Fri Oct 13 10:21:24 2017] blk_update_request: I/O error, dev sdx, sector 2320

2)

ndc-cl-mon1:~# ceph status
  cluster:
    id:     48158350-ba8a-420b-9c09-68da57205924
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ndc-cl-mon1,ndc-cl-mon2,ndc-cl-mon3
    mgr: ndc-cl-mon1(active), standbys: ndc-cl-mon3, ndc-cl-mon2
    osd: 161 osds: 160 up, 160 in

  data:
    pools:   4 pools, 12288 pgs
    objects: 663k objects, 2650 GB
    usage:   9695 GB used, 258 TB / 267 TB avail
    pgs:     12288 active+clean

  io:
    client:   0 B/s rd, 1248 kB/s wr, 49 op/s rd, 106 op/s wr

3) 

https://pastebin.com/MeCKqvp1

On Tue, Oct 17, 2017 at 5:59 PM, Jamie Fargen <jfargen@xxxxxxxxxx> wrote:
Alejandro-Please provide the folloing information:
1) Include an example of an actual message you are seeing in dmesg.
2) Provide the output of # ceph status
3) Provide the output of # ceph osd tree

Regards,
Jamie Fargen

On Tue, Oct 17, 2017 at 4:34 PM, Alejandro Comisario <alejandro@xxxxxxxxxxx> wrote:
hi guys, any tip or help ?

On Mon, Oct 16, 2017 at 1:50 PM, Alejandro Comisario <alejandro@xxxxxxxxxxx> wrote:
Hi all, i have to hot-swap a failed osd on a Luminous Cluster with Blue store (the disk is SATA, WAL and DB are on NVME).

I've issued a:
* ceph osd crush reweight osd_id 0
* systemctl stop (osd I'd daemon)
* umount /var/lib/ceph/osd/osd_id
* ceph osd destroy osd_id

everything seems of, but if I left everything as is ( until I wait for the replaced disk ) I can see that dmesg errors on writing on the device are still appearing.

The osd is of course down and out the crushmap.
am I missing something ? like a step to execute or something else ?

hoping to get help.
best.

alejandrito

-- 
Alejandro Comisario
CTO | NUBELIU
E-mail: alejandro@xxxxxxxxxxxCell: +54911 3770 1857
_

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jamie Fargen
Consultant
jfargen@xxxxxxxxxx
813-817-4430

-- 
Alejandro Comisario
CTO | NUBELIU
E-mail: alejandro@xxxxxxxxxxxCell: +54911 3770 1857
_

-- 
Jamie Fargen
Consultant
jfargen@xxxxxxxxxx
813-817-4430

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com