On Fri, Apr 6, 2012 at 12:45, Bernard Grymonpon <bernard@xxxxxxxxxxxx> wrote: > Lets go wild, and say, if you have hunderds of machines, summing up to thousands of of disks, all already migrated/moved to other machines/... , and it reports that OSD 536 is offline, how will you find what disk is failing/corrupt/... in which machine? Will you keep track which OSD ran on which node last? That's a good question and I don't have a good enough answer for you yet. Rest assured that's a valid concern. It seems we're still approaching this from different angles. You want to have an inventory of disks, known by uuid, and want to track where they are, and plan their moves. I want to know I have N servers with K hdd slots each, and I want each one to be fully populated with healthy disks. I don't care what disk is where, and I don't think it's realistic for me to maintain a manual inventory. A failed disk means unplug that disk. An empty slot means plug in a disk from the dedicated pile of spares. A chassis needing maintenance is to be shut down, disks unplugged & plugged in elsewhere. I don't care where. A lost disk needs to have its osd deleted at some point (or just let them pile up; not a realistic problem for a decade or so). Any inventory of disks is only realistic from the discovery angle; just report what's plugged in right now. I consider individual disks just about as uninteresting as power supplies. Does that make sense? Details pending.. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html