Hi Daniel, When I encounter an OSD which I can start, but which then stops on its own after running for some period of time, then root cause has generally been sectors pending reallocation on the hard drive the OSD is using. The OSD will run fine until it attempts to read from the bad disk sectors and then it produces a read error and drops offline. You can check the disk using smartmon-tools, and if there are sectors pending reallocation, remove the OSD from the cluster, use dd to write zeros over the drive (this will cause the drive to reallocate spare sectors to replace the bad sectors), then re-add the OSD to the cluster. -Steve On 02/05/2015 08:19 AM, Daniel Takatori
Ohara wrote:
-- Steve Anthony LTS HPC Support Specialist Lehigh University sma310@xxxxxxxxxx |
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com