Re: Giant + nfs over cephfs hang tasks

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Mon, 1 Dec 2014 11:33:19 +0000 (GMT)

Ilya,

I see. My server is has 24GB of ram + 3GB of swap. While running the tests, I've noticed that the server had 14GB of ram shown as cached and only 2MB were used from the swap. Not sure if this is helpful to your debugging.

Andrei

-- 
Andrei Mikhailovsky
Director
Arhont Information Security

Web: http://www.arhont.com
http://www.wi-foo.com
Tel: +44 (0)870 4431337
Fax: +44 (0)208 429 3111
PGP: Key ID - 0x2B3438DE
PGP: Server - keyserver.pgp.com

DISCLAIMER

The information contained in this email is intended only for the use of the person(s) to whom it is addressed and may be confidential or contain legally privileged information. If you are not the intended recipient you are hereby notified that any perusal, use, distribution, copying or disclosure is strictly prohibited. If you have received this email in error please immediately advise us by return email at andrei@xxxxxxxxxx and delete and purge the email and any attachments without making a copy.

From: "Ilya Dryomov" <ilya.dryomov@xxxxxxxxxxx>
To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "Gregory Farnum" <greg@xxxxxxxxxxx>
Sent: Monday, 1 December, 2014 11:06:37 AM
Subject: Re:  Giant + nfs over cephfs hang tasks

On Mon, Dec 1, 2014 at 1:39 PM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Ilya,
>
> I will try doing that once again tonight as this is a production cluster and
> when dds trigger that dmesg error the cluster's io becomes very bad and I
> have to reboot the server to get things on track. Most of my vms start
> having 70-90% iowait until that server is rebooted.

That's easily explained - those splats in dmesg indicate a case of a
severe memory pressure.

>
> I've actually checked what you've asked last time i've ran the test.
>
> When I do 4 dds concurrently nothing aprears in the dmesg output. No
> messages at all.
>
> The kern.log file that i've sent last time is what I got about a minute
> after i've started 8 dds. I've pasted the full output. The 8 dds did
> actually complete, but it took a rather long time. I was getting about 6MB/s
> per dd process compared to around 70MB/s per dd process when 4 dds were
> running. Do you still want me to run this or is the information i've
> provided enough?

No, no need if it's a production cluster.

Thanks,

                Ilya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com