Aw: Re: [PATCH] mark rbd requiring stable pages

"ronny.hegewald@xxxxxxxxx" <ronny.hegewald@xxxxxxxxx> · Tue, 3 Nov 2015 23:03:14 +0100

I have done some testing through the weekend on the same testmachine
i used the week before, but wasn't successfull to reproduce any corruption
this time.

I was rethinking my tests and why the results where different this time and
what your might also be different in your own test-environment.  This
difference might be the recovering state of my ceph installation and its effect
on the available memory and maybe other stats of the machine.

Back then when the corruptions occured on my systems i would have several
problems because of the closed connections caused by the bad crcs, with OSDs
either being restarted by myself or getting marked out and in of the cluster,
because the other OSDs didn't get the necessary ping from the OSD for too long.

The result of this was a noticeable increase of RAM-use. After some peerings
and recoverys the machines would use ca. 500-800 MB swap, when it would
have used none before.

This might also explain why my own attempts to reproduce the corruptions on
purpose were never successfull. This tests usually happened pretty much shortly
after a fresh start of the cluster.

Same situation on my test machine.  Before i run into a corruption on this machine 2 
weeks ago, it had already executed some recoverys and backfills on the OSDs that
are hosted on this machine. This resulted in ca. 900 MB swap being used.

This weekend the machine had the same amount of OSDs started, but there was not
much of recovering necessary, so there was no swap used and enogh Ram
available for ca. 2 GB of vfs-caches.

Whatever it might be, the recovering/peering process might be the important part
that brings the machine in the necessary state for the corruptions to occur.  This
memory increase should be easy to reproduce by sending a "ceph osd down x" to some
of the OSDs  This has always the described result in my installation.

I will check if this actually works on my side also. But that will take some time.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html