Re: Direct IO on CephFS for blocks larger than 8MB

"Huang, Xiwei" <xiwei.huang@xxxxxxxxx> · Sat, 16 Mar 2013 00:49:08 +0000

Yeah. In fact I found the bug report via your blog. Thanks for the sharing. :)

发自我的 iPhone

在 2013-3-16，8:05，"Sebastien Han" <sebastien.han@xxxxxxxxxxxx<mailto:sebastien.han@xxxxxxxxxxxx>> 写道：

Hi guys,

I imagine we'll see some applications doing large direct IO writes like this in the HPC space.

I already experienced the bug and actually reported it with Josh also one year ago. The problem was found with CephFS used to store KVM VMs disks (running VM). Somehow KVM with the 'no cache' option writes block higher than 8MB.

Cheers.

????
Sébastien Han
Cloud Engineer

"Always give 100%. Unless you're giving blood."

<image.png>

PHONE : +33 (0)1 49 70 99 72 ? MOBILE : +33 (0)6 52 84 44 70
EMAIL : sebastien.han@xxxxxxxxxxxx<mailto:sebastien.han@xxxxxxxxxxxx> ? SKYPE : han.sbastien
ADDRESS : 10, rue de la Victoire ? 75009 Paris
WEB : www.enovance.com<http://www.enovance.com> ? TWITTER : @enovance

On Mar 15, 2013, at 5:33 PM, Sage Weil <sage@xxxxxxxxxxx<mailto:sage@xxxxxxxxxxx>> wrote:

On Fri, 15 Mar 2013, Huang, Xiwei wrote:
ok. Thanks. Is there any documentation for ceph fs client arch as a
reference if I'd like to look into this?

Not really.

The code you are interested in is fs/ceph/file.c.  The direct io path
should be pretty clear; it'll build up a list of pages and pass it
directly to the code in net/ceph/osd_client.c to go out over the wire.
Enabling debugging (echo module ceph +p >
/sys/kernel/debug/dynamic_debug/control, same for module libceph) and
doing a single large direct-io write should show you where things are
going wrong.

sage

???? iPhone

? 2013-3-14?23:23?"Sage Weil" <sage@xxxxxxxxxxx<mailto:sage@xxxxxxxxxxx>> ???

On Thu, 14 Mar 2013, Huang, Xiwei wrote:
Hi, all,
   I noticed that CephFS fails to support Direct IO for blocks larger than 8MB, say:
          sudo dd if=/dev/zero of=mnt/cephfs/foo bs=16M count=1 oflag=direct
          dd: writing `mnt/cephfs/foo: Bad address
          1+0 records in
          0+0 records out
          0 bytes (0 B) copied, 0.213948 s, 0.0 kB/s
  My version Ceph is 0.56.1.
??I also found the bug has been already reported as Bug #2657.
  Is this fixed in the new 0.58 version?

I'm pretty sure this is a problem on the kernel client side of things, not
the server side (which by default handles writes up to ~100MB or so).  I
suspect it isn't terribly difficult to fix, but hasn't been prioritized...

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx<mailto:majordomo@xxxxxxxxxxxxxxx>
More majordomo info at  http://vger.kernel.org/majordomo-info.html