>On Wed, 31 Jul 2013, majianpeng wrote: >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: [snip] > >For ceph_osdc_readpages(), > >> A: ret = ENOENT > >From the original code, for this case we should zero the area. Why? Thanks! Jianpeng Ma >The object does not exist. > >> B: ret = 0 > >The object exists but we read 0 bytes, which means we are past EOF or the >object has size 0 bytes. Either way, we are either in a hole or past EOF. > >sage > >> >> Only we knowed this, we can handle exactly. >> Sage, can you explain those meaning in detail? >> >> Thanks! >> Jianpeng Ma >> >> >Regards >> >Yan, Zheng >> > >> > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. >> >>>> >> >>> >> >>> i think 'was_short' is equal to 'hit_hole' >> >>> >> >[snip] >> Thanks! >> Jianpeng Ma >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: >> >>>> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc >> >>>>>dd if=file_with_holes bs=8M >/dev/null >> >>>>> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> >>>> index 2ddf061..22a98e5 100644 >> >>>> --- a/fs/ceph/file.c >> >>>> +++ b/fs/ceph/file.c >> >>>> @@ -349,17 +349,17 @@ more: >> >>>> dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read, >> >>>> ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : ""); >> >>>> >> >>>> - if (ret > 0) { >> >>>> - int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT; >> >>>> + if (ret >= 0) { >> >>>> + int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT; >> >>>> >> >>>> - if (read < pos - off) { >> >>>> - dout(" zero gap %llu to %llu\n", off + read, pos); >> >>>> - ceph_zero_page_vector_range(page_align + read, >> >>>> - pos - off - read, pages); >> >>>> + if (was_short) { >> >>>> + dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len); >> >>>> + ceph_zero_page_vector_range(page_align + ret, >> >>>> + this_len - ret, pages); >> >>>> } >> >>>> - pos += ret; >> >>>> + pos += this_len; >> >>>> read = pos - off; >> >>>> - left -= ret; >> >>>> + left -= this_len; >> >>>> page_pos += didpages; >> >>>> pages_left -= didpages; >> >>>> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0". >> >>> >> >>>maybe we should add a i_size check. stop reading next strip object >> >>>when 'pos > i_size' >> >>> >> >> I think we can't do this because i_size may smaller than real size in ceph. >> >> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it >> >handles the case. >> > >> >We must do i_size check in striped_read(), otherwise user program always gets as >> >much data as it requests. For example >> > >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null >> > >> >Regards >> >Yan, Zheng >> > >> > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. >> >>>> >> >>> >> >>> i think 'was_short' is equal to 'hit_hole' >> >>> >> >[snip] Thanks! Jianpeng Ma >On Wed, 31 Jul 2013, majianpeng wrote: >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: >> >>>> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc >> >>>>>dd if=file_with_holes bs=8M >/dev/null >> >>>>> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> >>>> index 2ddf061..22a98e5 100644 >> >>>> --- a/fs/ceph/file.c >> >>>> +++ b/fs/ceph/file.c >> >>>> @@ -349,17 +349,17 @@ more: >> >>>> dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read, >> >>>> ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : ""); >> >>>> >> >>>> - if (ret > 0) { >> >>>> - int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT; >> >>>> + if (ret >= 0) { >> >>>> + int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT; >> >>>> >> >>>> - if (read < pos - off) { >> >>>> - dout(" zero gap %llu to %llu\n", off + read, pos); >> >>>> - ceph_zero_page_vector_range(page_align + read, >> >>>> - pos - off - read, pages); >> >>>> + if (was_short) { >> >>>> + dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len); >> >>>> + ceph_zero_page_vector_range(page_align + ret, >> >>>> + this_len - ret, pages); >> >>>> } >> >>>> - pos += ret; >> >>>> + pos += this_len; >> >>>> read = pos - off; >> >>>> - left -= ret; >> >>>> + left -= this_len; >> >>>> page_pos += didpages; >> >>>> pages_left -= didpages; >> >>>> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0". >> >>> >> >>>maybe we should add a i_size check. stop reading next strip object >> >>>when 'pos > i_size' >> >>> >> >> I think we can't do this because i_size may smaller than real size in ceph. >> >> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it >> >handles the case. >> > >> >We must do i_size check in striped_read(), otherwise user program always gets as >> >much data as it requests. For example >> > >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null >> > >> Before doing that, we must know the meaning of return value. > >For ceph_osdc_readpages(), > >> A: ret = ENOENT > >The object does not exist. > >> B: ret = 0 > >The object exists but we read 0 bytes, which means we are past EOF or the >object has size 0 bytes. Either way, we are either in a hole or past EOF. > >sage > >> >> Only we knowed this, we can handle exactly. >> Sage, can you explain those meaning in detail? >> >> Thanks! >> Jianpeng Ma >> >> >Regards >> >Yan, Zheng >> > >> > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. >> >>>> >> >>> >> >>> i think 'was_short' is equal to 'hit_hole' >> >>> >> >[snip] >> Thanks! >> Jianpeng Ma >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote: >> >>>> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc >> >>>>>dd if=file_with_holes bs=8M >/dev/null >> >>>>> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >> >>>> index 2ddf061..22a98e5 100644 >> >>>> --- a/fs/ceph/file.c >> >>>> +++ b/fs/ceph/file.c >> >>>> @@ -349,17 +349,17 @@ more: >> >>>> dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read, >> >>>> ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : ""); >> >>>> >> >>>> - if (ret > 0) { >> >>>> - int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT; >> >>>> + if (ret >= 0) { >> >>>> + int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT; >> >>>> >> >>>> - if (read < pos - off) { >> >>>> - dout(" zero gap %llu to %llu\n", off + read, pos); >> >>>> - ceph_zero_page_vector_range(page_align + read, >> >>>> - pos - off - read, pages); >> >>>> + if (was_short) { >> >>>> + dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len); >> >>>> + ceph_zero_page_vector_range(page_align + ret, >> >>>> + this_len - ret, pages); >> >>>> } >> >>>> - pos += ret; >> >>>> + pos += this_len; >> >>>> read = pos - off; >> >>>> - left -= ret; >> >>>> + left -= this_len; >> >>>> page_pos += didpages; >> >>>> pages_left -= didpages; >> >>>> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0". >> >>> >> >>>maybe we should add a i_size check. stop reading next strip object >> >>>when 'pos > i_size' >> >>> >> >> I think we can't do this because i_size may smaller than real size in ceph. >> >> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it >> >handles the case. >> > >> >We must do i_size check in striped_read(), otherwise user program always gets as >> >much data as it requests. For example >> > >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null >> > >> >Regards >> >Yan, Zheng >> > >> > >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand. >> >>>> >> >>> >> >>> i think 'was_short' is equal to 'hit_hole' >> >>> >> >[snip]?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f