Re: Re: question about striped_read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 31 Jul 2013, majianpeng wrote:
> >On Wed, 31 Jul 2013, majianpeng wrote:
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote:
> [snip]
> >
> >For ceph_osdc_readpages(),
> >
> >> A: ret = ENOENT
> >
> From the original code, for this case we should zero the area.
> Why?

If an object is missing it means that range of the file was never written 
to and it is a hole (if we are before EOF).

For example, open, seek(128MB), write 1 byte, close will write 1 byte to 
one object and everything before that, where the objects do not exist, is 
defined to be zeros..

sage



> 
> Thanks!
> Jianpeng Ma
> >The object does not exist.
> >
> >> B: ret = 0
> >
> >The object exists but we read 0 bytes, which means we are past EOF or the 
> >object has size 0 bytes.  Either way, we are either in a hole or past EOF.
> >
> >sage
> >
> >> 
> >> Only we knowed this, we can handle exactly.
> >> Sage, can you explain those meaning in detail?
> >> 
> >> Thanks!
> >> Jianpeng Ma
> >> 
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]
> >> Thanks!
> >> Jianpeng Ma
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote:
> >> >>>>
> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >> >>>>>
> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> >>>> index 2ddf061..22a98e5 100644
> >> >>>> --- a/fs/ceph/file.c
> >> >>>> +++ b/fs/ceph/file.c
> >> >>>> @@ -349,17 +349,17 @@ more:
> >> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >> >>>>
> >> >>>> -       if (ret > 0) {
> >> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >> >>>> +       if (ret >= 0) {
> >> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >> >>>>
> >> >>>> -               if (read < pos - off) {
> >> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >> >>>> -                                                   pos - off - read, pages);
> >> >>>> +               if (was_short) {
> >> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >> >>>> +                                                   this_len - ret, pages);
> >> >>>>                 }
> >> >>>> -               pos += ret;
> >> >>>> +               pos += this_len;
> >> >>>>                 read = pos - off;
> >> >>>> -               left -= ret;
> >> >>>> +               left -= this_len;
> >> >>>>                 page_pos += didpages;
> >> >>>>                 pages_left -= didpages;
> >> >>>>
> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >> >>>
> >> >>>maybe we should add a  i_size check. stop reading next strip object
> >> >>>when 'pos > i_size'
> >> >>>
> >> >> I think we can't do this because i_size may smaller than real size in ceph.
> >> >>
> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >> >handles the case.
> >> >
> >> >We must do i_size check in striped_read(), otherwise user program always gets as
> >> >much data as it requests. For example
> >> >
> >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >> >
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]
> Thanks!
> Jianpeng Ma
> >On Wed, 31 Jul 2013, majianpeng wrote:
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote:
> >> >>>>
> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >> >>>>>
> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> >>>> index 2ddf061..22a98e5 100644
> >> >>>> --- a/fs/ceph/file.c
> >> >>>> +++ b/fs/ceph/file.c
> >> >>>> @@ -349,17 +349,17 @@ more:
> >> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >> >>>>
> >> >>>> -       if (ret > 0) {
> >> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >> >>>> +       if (ret >= 0) {
> >> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >> >>>>
> >> >>>> -               if (read < pos - off) {
> >> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >> >>>> -                                                   pos - off - read, pages);
> >> >>>> +               if (was_short) {
> >> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >> >>>> +                                                   this_len - ret, pages);
> >> >>>>                 }
> >> >>>> -               pos += ret;
> >> >>>> +               pos += this_len;
> >> >>>>                 read = pos - off;
> >> >>>> -               left -= ret;
> >> >>>> +               left -= this_len;
> >> >>>>                 page_pos += didpages;
> >> >>>>                 pages_left -= didpages;
> >> >>>>
> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >> >>>
> >> >>>maybe we should add a  i_size check. stop reading next strip object
> >> >>>when 'pos > i_size'
> >> >>>
> >> >> I think we can't do this because i_size may smaller than real size in ceph.
> >> >>
> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >> >handles the case.
> >> >
> >> >We must do i_size check in striped_read(), otherwise user program always gets as
> >> >much data as it requests. For example
> >> >
> >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >> >
> >> Before doing that, we must know the meaning of return value.
> >
> >For ceph_osdc_readpages(),
> >
> >> A: ret = ENOENT
> >
> >The object does not exist.
> >
> >> B: ret = 0
> >
> >The object exists but we read 0 bytes, which means we are past EOF or the 
> >object has size 0 bytes.  Either way, we are either in a hole or past EOF.
> >
> >sage
> >
> >> 
> >> Only we knowed this, we can handle exactly.
> >> Sage, can you explain those meaning in detail?
> >> 
> >> Thanks!
> >> Jianpeng Ma
> >> 
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]
> >> Thanks!
> >> Jianpeng Ma
> >> >On Tue, Jul 30, 2013 at 7:41 PM, majianpeng <majianpeng@xxxxxxxxx> wrote:
> >> >>>>
> >> >>>>>dd if=/dev/urandom bs=1M count=2 of=file_with_holes
> >> >>>>>dd if=/dev/urandom bs=1M count=2 seek=4 of=file_with_holes conv=notrunc
> >> >>>>>dd if=file_with_holes bs=8M >/dev/null
> >> >>>>>
> >> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> >> >>>> index 2ddf061..22a98e5 100644
> >> >>>> --- a/fs/ceph/file.c
> >> >>>> +++ b/fs/ceph/file.c
> >> >>>> @@ -349,17 +349,17 @@ more:
> >> >>>>         dout("striped_read %llu~%u (read %u) got %d%s%s\n", pos, left, read,
> >> >>>>              ret, hit_stripe ? " HITSTRIPE" : "", was_short ? " SHORT" : "");
> >> >>>>
> >> >>>> -       if (ret > 0) {
> >> >>>> -               int didpages = (page_align + ret) >> PAGE_CACHE_SHIFT;
> >> >>>> +       if (ret >= 0) {
> >> >>>> +               int didpages = (page_align + this_len) >> PAGE_CACHE_SHIFT;
> >> >>>>
> >> >>>> -               if (read < pos - off) {
> >> >>>> -                       dout(" zero gap %llu to %llu\n", off + read, pos);
> >> >>>> -                       ceph_zero_page_vector_range(page_align + read,
> >> >>>> -                                                   pos - off - read, pages);
> >> >>>> +               if (was_short) {
> >> >>>> +                       dout(" zero gap %llu to %llu\n", pos + ret, pos + this_len);
> >> >>>> +                       ceph_zero_page_vector_range(page_align + ret,
> >> >>>> +                                                   this_len - ret, pages);
> >> >>>>                 }
> >> >>>> -               pos += ret;
> >> >>>> +               pos += this_len;
> >> >>>>                 read = pos - off;
> >> >>>> -               left -= ret;
> >> >>>> +               left -= this_len;
> >> >>>>                 page_pos += didpages;
> >> >>>>                 pages_left -= didpages;
> >> >>>>
> >> >>>> This patch can do those case. It only add ret== 0 in judgement 'ret > 0".
> >> >>>
> >> >>>maybe we should add a  i_size check. stop reading next strip object
> >> >>>when 'pos > i_size'
> >> >>>
> >> >> I think we can't do this because i_size may smaller than real size in ceph.
> >> >>
> >> >ceph_aio_read() calls ceph_do_getattr() when 'checkeof = true', it
> >> >handles the case.
> >> >
> >> >We must do i_size check in striped_read(), otherwise user program always gets as
> >> >much data as it requests. For example
> >> >
> >> >dd if=/dev/urandom bs=1M count=1 of=file_with_holes
> >> >dd if=file_with_holes bs=64M iflag=direct of=/dev/null
> >> >
> >> >Regards
> >> >Yan, Zheng
> >> >
> >> >
> >> >>>> But i think i will add a parameter about hit_hole. It will make the code easy to understand.
> >> >>>>
> >> >>>
> >> >>> i think 'was_short' is equal to 'hit_hole'
> >> >>>
> >> >[snip]N????y????b?????v?????{.n??????z??ay????????j???f????????????????:+v??????????zZ+??????"?!?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux