Hi,
On 11/16/2017 01:36 PM, Jogi Hofmüller wrote:
Dear all,
for about a month we experience something strange in our small cluster.
Let me first describe what happened on the way.
On Oct 4ht smartmon told us that the journal SSDs in one of our two
ceph nodes will fail. Since getting replacements took way longer than
expected we decided to place the journal on a spare HDD rather than
have the SSD fail and leave us in an uncertain state.
On Oct 17th we finally got the replacement SSDs. First we replaced
broken/soon to be broken SSD and moved journals from the temporarily
used HDD to the new SSD. Then we also replaced the journal SSD on the
other ceph node since it would probably fail sooner than later.
We performed all operations by setting noout first, then taking down
the OSDs, flushing journals, replacing disks, creating new journals and
starting OSDs again. We waited until the cluster was back in HEALTH_OK
state before we proceeded to the next node.
AFAIR mkjournal crashed once on the second node. So we ran the command
again and journals where created.
*snipsnap*
What remains is the growth of used data in the cluster.
I put background information of our cluster and some graphs of
different metrics on a wiki page:
https://wiki.mur.at/Dokumentation/CephCluster
Basically we need to reduce the growth in the cluster, but since we are
not sure what causes it we don't have an idea.
Just a wild guess (wiki page is not accessible yet):
Are you sure that the journals were creating on the new SSD? If the
journals were created as files in the OSD directory, their size might be
accounted for in the cluster size report (assuming OSDs are reporting
their free space, not a sum of all object sizes).
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com