On Wed, 31 Jul 2013, majianpeng wrote: > >On Wed, 31 Jul 2013, majianpeng wrote: > >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: > [snip] > > > >For ceph_osdc_readpages(), > > > >> A: ret = ENOENT > > > From the original code, for this case we should zero the area. > Why? If an object is missing it means that range of the file was never written to and it is a hole (if we are before EOF). For example, open, seek(128MB), write 1 byte, close will write 1 byte to one object and everything before that, where the objects do not exist, is defined to be zeros.. sage > > Thanks! > Jianpeng Ma > >The object does not exist. > > > >> B: ret = 0 > > > >The object exists but we read 0 bytes, which means we are past EOF or the > >object has size 0 bytes. Either way, we are either in a hole or past EOF. > > > >sage > > > >> > >> Only we knowed this, we can handle exactly. > >> Sage, can you explain those meaning in detail? > >> > >> Thanks! > >> Jianpeng Ma > >> > >> >Regards > >> >Yan, Zheng > >> > > >> > > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. > >> >>>> > >> >>> > >> >>> i think 'was_short' is equal to 'hit_hole' > >> >>> > >> >[snip] > >> Thanks! > >> Jianpeng Ma > >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: > >> >>>> > >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes > >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc > >> >>>>>dd if=file_with_holes bs=8M >/dev/null > >> >>>>> > >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c > >> >>>> index 2ddf061..22a98e5 100644 > >> >>>> --- a/fs/ceph/file.c > >> >>>> +++ b/fs/ceph/file.c > >> >>>> @@ -349,17 +349,17 @@ more: > >> >>>> dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read, > >> >>>> ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : ""); > >> >>>> > >> >>>> - if (ret > 0) { > >> >>>> - int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT; > >> >>>> + if (ret >= 0) { > >> >>>> + int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT; > >> >>>> > >> >>>> - if (read < pos - off) { > >> >>>> - dout(" zero gap %llu to %llu\n", off + read, pos); > >> >>>> - ceph_zero_page_vector_range(page_align + read, > >> >>>> - pos - off - read, pages); > >> >>>> + if (was_short) { > >> >>>> + dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len); > >> >>>> + ceph_zero_page_vector_range(page_align + ret, > >> >>>> + this_len - ret, pages); > >> >>>> } > >> >>>> - pos += ret; > >> >>>> + pos += this_len; > >> >>>> read = pos - off; > >> >>>> - left -= ret; > >> >>>> + left -= this_len; > >> >>>> page_pos += didpages; > >> >>>> pages_left -= didpages; > >> >>>> > >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0". > >> >>> > >> >>>maybe we should add a i_size check. stop reading next strip object > >> >>>when 'pos > i_size' > >> >>> > >> >> I think we can't do this because i_size may smaller than real size in ceph. > >> >> > >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it > >> >handles the case. > >> > > >> >We must do i_size check in striped_read(), otherwise user program always gets as > >> >much data as it requests. For example > >> > > >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes > >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null > >> > > >> >Regards > >> >Yan, Zheng > >> > > >> > > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. > >> >>>> > >> >>> > >> >>> i think 'was_short' is equal to 'hit_hole' > >> >>> > >> >[snip] > Thanks! > Jianpeng Ma > >On Wed, 31 Jul 2013, majianpeng wrote: > >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: > >> >>>> > >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes > >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc > >> >>>>>dd if=file_with_holes bs=8M >/dev/null > >> >>>>> > >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c > >> >>>> index 2ddf061..22a98e5 100644 > >> >>>> --- a/fs/ceph/file.c > >> >>>> +++ b/fs/ceph/file.c > >> >>>> @@ -349,17 +349,17 @@ more: > >> >>>> dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read, > >> >>>> ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : ""); > >> >>>> > >> >>>> - if (ret > 0) { > >> >>>> - int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT; > >> >>>> + if (ret >= 0) { > >> >>>> + int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT; > >> >>>> > >> >>>> - if (read < pos - off) { > >> >>>> - dout(" zero gap %llu to %llu\n", off + read, pos); > >> >>>> - ceph_zero_page_vector_range(page_align + read, > >> >>>> - pos - off - read, pages); > >> >>>> + if (was_short) { > >> >>>> + dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len); > >> >>>> + ceph_zero_page_vector_range(page_align + ret, > >> >>>> + this_len - ret, pages); > >> >>>> } > >> >>>> - pos += ret; > >> >>>> + pos += this_len; > >> >>>> read = pos - off; > >> >>>> - left -= ret; > >> >>>> + left -= this_len; > >> >>>> page_pos += didpages; > >> >>>> pages_left -= didpages; > >> >>>> > >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0". > >> >>> > >> >>>maybe we should add a i_size check. stop reading next strip object > >> >>>when 'pos > i_size' > >> >>> > >> >> I think we can't do this because i_size may smaller than real size in ceph. > >> >> > >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it > >> >handles the case. > >> > > >> >We must do i_size check in striped_read(), otherwise user program always gets as > >> >much data as it requests. For example > >> > > >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes > >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null > >> > > >> Before doing that, we must know the meaning of return value. > > > >For ceph_osdc_readpages(), > > > >> A: ret = ENOENT > > > >The object does not exist. > > > >> B: ret = 0 > > > >The object exists but we read 0 bytes, which means we are past EOF or the > >object has size 0 bytes. Either way, we are either in a hole or past EOF. > > > >sage > > > >> > >> Only we knowed this, we can handle exactly. > >> Sage, can you explain those meaning in detail? > >> > >> Thanks! > >> Jianpeng Ma > >> > >> >Regards > >> >Yan, Zheng > >> > > >> > > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. > >> >>>> > >> >>> > >> >>> i think 'was_short' is equal to 'hit_hole' > >> >>> > >> >[snip] > >> Thanks! > >> Jianpeng Ma > >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: > >> >>>> > >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes > >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc > >> >>>>>dd if=file_with_holes bs=8M >/dev/null > >> >>>>> > >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c > >> >>>> index 2ddf061..22a98e5 100644 > >> >>>> --- a/fs/ceph/file.c > >> >>>> +++ b/fs/ceph/file.c > >> >>>> @@ -349,17 +349,17 @@ more: > >> >>>> dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read, > >> >>>> ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : ""); > >> >>>> > >> >>>> - if (ret > 0) { > >> >>>> - int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT; > >> >>>> + if (ret >= 0) { > >> >>>> + int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT; > >> >>>> > >> >>>> - if (read < pos - off) { > >> >>>> - dout(" zero gap %llu to %llu\n", off + read, pos); > >> >>>> - ceph_zero_page_vector_range(page_align + read, > >> >>>> - pos - off - read, pages); > >> >>>> + if (was_short) { > >> >>>> + dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len); > >> >>>> + ceph_zero_page_vector_range(page_align + ret, > >> >>>> + this_len - ret, pages); > >> >>>> } > >> >>>> - pos += ret; > >> >>>> + pos += this_len; > >> >>>> read = pos - off; > >> >>>> - left -= ret; > >> >>>> + left -= this_len; > >> >>>> page_pos += didpages; > >> >>>> pages_left -= didpages; > >> >>>> > >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0". > >> >>> > >> >>>maybe we should add a i_size check. stop reading next strip object > >> >>>when 'pos > i_size' > >> >>> > >> >> I think we can't do this because i_size may smaller than real size in ceph. > >> >> > >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it > >> >handles the case. > >> > > >> >We must do i_size check in striped_read(), otherwise user program always gets as > >> >much data as it requests. For example > >> > > >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes > >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null > >> > > >> >Regards > >> >Yan, Zheng > >> > > >> > > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. > >> >>>> > >> >>> > >> >>> i think 'was_short' is equal to 'hit_hole' > >> >>> > >> >[snip]N????y????b?????v?????{.n??????z??ay????????j???f????????????????:+v??????????zZ+??????"?!? -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html