Yeah. In fact I found the bug report via your blog. Thanks for the sharing. :) 发自我的 iPhone 在 2013-3-16,8:05,"Sebastien Han" <sebastien.han@xxxxxxxxxxxx<mailto:sebastien.han@xxxxxxxxxxxx>> 写道: Hi guys, I imagine we'll see some applications doing large direct IO writes like this in the HPC space. I already experienced the bug and actually reported it with Josh also one year ago. The problem was found with CephFS used to store KVM VMs disks (running VM). Somehow KVM with the 'no cache' option writes block higher than 8MB. Cheers. ???? Sébastien Han Cloud Engineer "Always give 100%. Unless you're giving blood." <image.png> PHONE : +33 (0)1 49 70 99 72 ? MOBILE : +33 (0)6 52 84 44 70 EMAIL : sebastien.han@xxxxxxxxxxxx<mailto:sebastien.han@xxxxxxxxxxxx> ? SKYPE : han.sbastien ADDRESS : 10, rue de la Victoire ? 75009 Paris WEB : www.enovance.com<http://www.enovance.com> ? TWITTER : @enovance On Mar 15, 2013, at 5:33 PM, Sage Weil <sage@xxxxxxxxxxx<mailto:sage@xxxxxxxxxxx>> wrote: On Fri, 15 Mar 2013, Huang, Xiwei wrote: ok. Thanks. Is there any documentation for ceph fs client arch as a reference if I'd like to look into this? Not really. The code you are interested in is fs/ceph/file.c. The direct io path should be pretty clear; it'll build up a list of pages and pass it directly to the code in net/ceph/osd_client.c to go out over the wire. Enabling debugging (echo module ceph +p > /sys/kernel/debug/dynamic_debug/control, same for module libceph) and doing a single large direct-io write should show you where things are going wrong. sage ???? iPhone ? 2013-3-14?23:23?"Sage Weil" <sage@xxxxxxxxxxx<mailto:sage@xxxxxxxxxxx>> ??? On Thu, 14 Mar 2013, Huang, Xiwei wrote: Hi, all, I noticed that CephFS fails to support Direct IO for blocks larger than 8MB, say: sudo dd if=/dev/zero of=mnt/cephfs/foo bs=16M count=1 oflag=direct dd: writing `mnt/cephfs/foo: Bad address 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.213948 s, 0.0 kB/s My version Ceph is 0.56.1. ??I also found the bug has been already reported as Bug #2657. Is this fixed in the new 0.58 version? I'm pretty sure this is a problem on the kernel client side of things, not the server side (which by default handles writes up to ~100MB or so). I suspect it isn't terribly difficult to fix, but hasn't been prioritized... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx<mailto:majordomo@xxxxxxxxxxxxxxx> More majordomo info at http://vger.kernel.org/majordomo-info.html