Ilya,
I think i spoke too soon in my last message. I've not given it more load (running 8 concurrent dds with bs=4M) and about a minute or so after starting i've seen problems in dmesg output. I am attaching kern.log file for you reference.
Please check starting with the following line: Nov 29 12:07:38 arh-ibstorage1-ib kernel: [ 3831.906510]. This is when I've started the concurrent 8 dds.
The command that caused this is:
time dd if=/dev/zero of=4G00 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G11 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G22 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G33 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G44 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G55 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G66 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G77 bs=4M count=5K oflag=direct &
I've ran the same test about 10 times but with only 4 concurrent dds and that didn't cause the issue.
I think i spoke too soon in my last message. I've not given it more load (running 8 concurrent dds with bs=4M) and about a minute or so after starting i've seen problems in dmesg output. I am attaching kern.log file for you reference.
Please check starting with the following line: Nov 29 12:07:38 arh-ibstorage1-ib kernel: [ 3831.906510]. This is when I've started the concurrent 8 dds.
The command that caused this is:
time dd if=/dev/zero of=4G00 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G11 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G22 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G33 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G44 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G55 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G66 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G77 bs=4M count=5K oflag=direct &
I've ran the same test about 10 times but with only 4 concurrent dds and that didn't cause the issue.
Should I try the 3.18 kernel again to see if 8dds produce similar output?
Andrei
Andrei
From: "Ilya Dryomov" <ilya.dryomov@xxxxxxxxxxx>
To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Saturday, 29 November, 2014 10:40:48 AM
Subject: Re: Giant + nfs over cephfs hang tasks
On Sat, Nov 29, 2014 at 2:33 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Ilya,
>
> not sure if dmesg output in the previous is related to the cephfs, but from
> what I can see it looks good with your kernel. I would have seen hang tasks
> by now, but not anymore. I've ran a bunch of concurrent dd tests and also
> the file touch tests and there are no more delays.
>
> So, it looks like you have nailed the bug!
Great, good to have another data point.
>
> Do you plan to backport the fix to the 3.16 or 3.17 branches?
That's the tricky part. Can you try
http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/for-andrei-1/linux-image-3.17.4-ceph-00638-g0f25ebb_3.17.4-ceph-00638-g0f25ebb-1_amd64.deb
?
Thanks,
Ilya
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com