OK, so looks like its ceph crushmap behavior
http://docs.ceph.com/docs/master/rados/operations/crush-map/ -- Deepak From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Deepak Naidu OK, I fixed the issue. But this is very weird. But will list them so its easy for other to check when there is similar issue. 1)
I had create rack aware osd tree
2)
I have SATA OSD’s and NVME OSD 3)
I created rack aware policy for both SATA and NVME OSD 4)
NVME OSD was used for CEPH FS Meta 5)
Recently: When I tried reboot of OSD node, it seemed that my journal volumes which were on NVME didn’t startup bcos of the UDEV rules and I had to
create startup script to fix them. 6)
With that. I had rebooted all the OSD one by one monitoring the ceph status. 7)
I was at the 3rd last node, then I notice the pgstuck warning. Not sure when and what happened, but I started getting this PG stuck issue(which
is listed in my original email) 8)
I wasted time to look at the issue/error, but then I found the pool 100% used issue. 9)
Now when I tried ceph osd tree. It looks like my NVME OSD’s went back to the host level OSD’s rather than the newly created/mapped NVME rack level.
Ie no OSD’s under nvme-host name. This was the issue. 10)
Luckily I had created the backup of compiled version. I imported them in crushmap rule and now pool status is OK. But, my question is how did ceph re-map the CRUSH rule ? I had to create “new host entry” for NVME in crushmap ie host OSD1-nvme -- This is just dummy entry in crushmap ie it doesn’t resolve to any hostname host OSD1 -- This is the actual hostname and resolves to IP and has an hostname Is that the issue ? Current status health HEALTH_OK osdmap e5108: 610 osds: 610 up, 610 in flags sortbitwise,require_jewel_osds pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects 1155 GB used, 5462 TB / 5463 TB avail 15450 active+clean Pool1 15 233M 0 1820T 3737
Pool2 16 0 0 1820T 0
Pool Meta 17 34928k 0 2357G 28 Partial list of my osd tree -15 2.76392 rack rack1-nvme
-18 0.69098 host OSD1-nvme
60 0.69098 osd.60 up 1.00000 1.00000
-21 0.69098 host OSD2-nvme
243 0.69098 osd.243 up 1.00000 1.00000
-24 0.69098 host OSD3-NGN1-nvme
426 0.69098 osd.426 up 1.00000 1.00000
-1 5456.27734 root default
-12 2182.51099 rack rack1-sata
-2 545.62775 host OSD1
0 9.09380 osd.0 up 1.00000 1.00000
1 9.09380 osd.1 up 1.00000 1.00000
2 9.09380 osd.2 up 1.00000 1.00000
3 9.09380 osd.3 up 1.00000 1.00000 -2 545.62775 host OSD2
0 9.09380 osd.0 up 1.00000 1.00000
1 9.09380 osd.1 up 1.00000 1.00000
2 9.09380 osd.2 up 1.00000 1.00000
3 9.09380 osd.3 up 1.00000 1.00000 -2 545.62775 host OSD2
0 9.09380 osd.0 up 1.00000 1.00000
1 9.09380 osd.1 up 1.00000 1.00000
2 9.09380 osd.2 up 1.00000 1.00000
3 9.09380 osd.3 up 1.00000 1.00000 -- Deepak From: David Turner [mailto:drakonstein@xxxxxxxxx]
ceph status Is your meta pool on ssds instead of the same root and osds as the rest of the cluster? On Fri, Jun 30, 2017, 9:29 PM Deepak Naidu <dnaidu@xxxxxxxxxx> wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com