On Mon, 9 Apr 2012, Tommi Virtanen wrote: > On Fri, Apr 6, 2012 at 12:45, Bernard Grymonpon <bernard@xxxxxxxxxxxx> wrote: > > Lets go wild, and say, if you have hunderds of machines, summing up to thousands of of disks, all already migrated/moved to other machines/... , and it reports that OSD 536 is offline, how will you find what disk is failing/corrupt/... in which machine? Will you keep track which OSD ran on which node last? > > That's a good question and I don't have a good enough answer for you > yet. Rest assured that's a valid concern. > > It seems we're still approaching this from different angles. You want > to have an inventory of disks, known by uuid, and want to track where > they are, and plan their moves. > > I want to know I have N servers with K hdd slots each, and I want each > one to be fully populated with healthy disks. I don't care what disk > is where, and I don't think it's realistic for me to maintain a manual > inventory. A failed disk means unplug that disk. An empty slot means > plug in a disk from the dedicated pile of spares. A chassis needing > maintenance is to be shut down, disks unplugged & plugged in > elsewhere. I don't care where. A lost disk needs to have its osd > deleted at some point (or just let them pile up; not a realistic > problem for a decade or so). Any inventory of disks is only realistic > from the discovery angle; just report what's plugged in right now. > > I consider individual disks just about as uninteresting as power > supplies. Does that make sense? One thing we need to keep in mind here is that the individual disks are placed in the CRUSH hierarchy based on the host/rack/etc location in the datacenter. Moving disk around arbitrarily will break the placement constraints if that position isn't also changed. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html