Re: Weird issues related to (large/small) weights in mixed nvme/hdd pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, this did turn out to be our main issue. We also had a smaller issue, but this was the one that caused parts of our pools to go offline for a short time. Or, 'cause' was us adding some new NVMe drives that were much larger than the ones we already had so too many PGs got mapped to them but we didn't realize at first that it was the problem. Taking those OSDs down again allowed us to quickly recover though.

It was a little hard to figure out, mostly because we had two separate problems at the same time. Some kind of separate warning message would have been nice (couldn't find anything in the logs), and perhaps allow the PGs to activate anyway and put the cluster in health_warn?

My colleague built a lab copy of our environment virtualized and we used that to recreate and then fix our issues.

We are also working on installing more OSDs, as was our original plan, so PGs per OSD will decrease over time. At the time we thought to aim for 300 PGs per OSD, which I realize now was probably not a great idea, something like 150 would have been better.

/Peter

Den 2018-01-31 kl. 13:42, skrev Thomas Bennett:
Hi Peter,

Relooking at your problem, you might want to keep track of this issue: http://tracker.ceph.com/issues/22440

Regards,
Tom

On Wed, Jan 31, 2018 at 11:37 AM, Thomas Bennett <thomas@xxxxxxxxx> wrote:
Hi Peter,

From your reply, I see that:
  1. pg 3.12c is part of pool 3. 
  2. The osd's in the "up"  for pg 3.12c  are: 6, 0, 12.

I suggest to check on this 'activating' issue do the following:
  1. What is the rule that pool 3 should follow, 'hybrid', 'nvme' or 'hdd'? (Use the ceph osd pool ls detail command and look at pool 3's crush rule)
  2. Then check are osds 6, 0, 12 backed by nvme's or hdd's? (Use ceph osd tree | grep nvme command to find your nvme backed osds.)

If your problem is similar to mine, you will have osds that are nvme backed in a pool that should only be backed by hdds, which was causing a pg to go into 'activating' state and staying there.

Cheers,
Tom


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux