Re: Potential OSD deadlock?

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 22 Sep 2015 07:09:20 -0700



On Mon, Sep 21, 2015 at 11:43 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> I'm starting to wonder if this has to do with some OSDs getting full
> or the 0.94.3 code. Earlier this afternoon, I cleared out my test
> cluster so there was no pools. I created anew rbd pool and started
> filling it with 6 - 1TB fio jobs replication 3 with 6 spindles over
> six servers. It was running 0.94.2 at the time. After several hours of
> writes, we had the new patched 0.93.3 binaries ready for testing so I
> rolled the update on the test cluster while the fio jobs were running.
> There were a few blocked I/O as the services were restarted (nothing
> I'm concerned about). Now that the OSDs are about 60% full, the
> blocked I/O is becoming very frequent even with the backports. The
> write bandwidth was consistently at 200 MB/s until this point, now it
> is fluctuating between 200 MB/s and 75 MB/s mostly around about
> 100MB/s. Our production cluster is XFS on the OSDs, this test cluster
> is EXT4.
>
> I'll see if I can go back to 0.94.2 and fill the cluster up again....
> Going back to 0.94.2 and 0.94.0 still has the issue (although I didn't
> refill the cluster, I didn't delete what was already there). I'm
> building the latest of hammer-backports now and see if it resolves the
> issue.

You're probably running into the FileStore collection splitting and
that's what is slowing things down in that testing.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com