Re: Patch for "Striped" read from AFR volumes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Csibra,

The patch contribution is really appreciated. I did not verify the
correctness of
the code but I can make out that you are doing RR of readv().
But making read()s round-robin will decrease the performance (theoritically)
as we wont be taking advantage of read ahead algorithm of the kernel. The
better approach would be to make a file to be read from the same child
everytime (even on the next open) but make different files to be read from
different children. A good way of deciding the child to be read from is
by (inode_number % child_count), this change is in the TLA repository. Could
you test how your patch performs against the TLA source?

check doc/translator-option.txt for the options of AFR (option read-subvolume)

A better way to define striped reads would be: if a read request comes for 1MB,
get 0.5 MB from first child and 0.5MB from second child and combine the reads.
However this way also we are not sure about the performance gain.

Thanks
Krishna

On Dec 31, 2007 9:44 PM, Csibra Gergo <gergo@xxxxxxxxx> wrote:
> Hi,
>
> apply following patch, to read AFR volumes like RAID0 volumes. The
> current implementation of AFR reads every blocks from the first child
> if that available. With this simple patch cycles through all available
> childs. This meand every afr_readv calls reads from the next child
> readed as previous call. So if U have 4 child, first block will be
> readed from 1st next from 2nd next from 3rd next from 4th and starts
> from first so next from 1st.
>
> to apply this patch
> cd xlators/cluster/afr/src
> patch -p0 <afr_striped_read_1.3.7.diff
> make
> make install
>
> patch also available here:
> http://www.csibra.hu/glusterfs/afr_striped_read_1.3.7.diff
>
> as you see this patch against 1.3.7 version.
>
> here's the patch:
> >>>>CUT HERE<<<<
> *** /root/afr.c 2007-10-17 17:40:37.000000000 +0200
> --- afr.c       2007-12-31 16:51:38.000000000 +0100
> ***************
> *** 2448,2453 ****
> --- 2448,2469 ----
>         if (afrfdp->fdstate[i])
>           break;
>         }
> +       if(i == pvt->child_count) {
> +         // if we reached the last child, test if maybe there're unreaded child
> +         data_t *fr = dict_get(local->fd->ctx, "first_read");
> +       if(fr) {
> +         int32_t frd = data_to_int32(fr);
> +         // frd contains the first child what readed
> +         if(frd > 0) {
> +           // if first readed child was not the first physical child, start child search again
> +           i = 0;
> +           for (; i < pvt->child_count; i++) {
> +             if (afrfdp->fdstate[i])
> +               break;
> +           }
> +         }
> +       }
> +       }
>         if (i < pvt->child_count) {
>                 STACK_WIND (frame,
>                     afr_readv_cbk,
> ***************
> *** 2492,2501 ****
>     local->size = size;
>     local->fd = fd;
>
> !   for (i = 0; i < child_count; i++) {
>       if (afrfdp->fdstate[i] && pvt->state[i])
>         break;
>     }
>     if (i == child_count) {
>       STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL);
>     } else {
> --- 2508,2548 ----
>     local->size = size;
>     local->fd = fd;
>
> !   int32_t next_child, first_read = 0;
> !   data_t *nxtc = dict_get(fd->ctx, "next_child");
> !   if(nxtc) {
> !     next_child = data_to_int32(nxtc);
> !   } else {
> !     next_child = -1;
> !     first_read = 1;
> !   }
> !   next_child++;
> !   if(next_child == child_count) {
> !     next_child = 0;
> !   }
> !
> !   for (i = next_child; i < child_count; i++) {
>       if (afrfdp->fdstate[i] && pvt->state[i])
>         break;
>     }
> +
> +   if(i == child_count) {
> +     i = 0;
> +     for (i = 0; i < child_count; i++) {
> +       if (afrfdp->fdstate[i] && pvt->state[i])
> +       break;
> +     }
> +     if(i == child_count) {
> +       next_child = 0;
> +     } else {
> +       next_child = i;
> +     }
> +   }
> +   dict_set(fd->ctx, "next_child", data_from_int32(next_child));
> +   if(first_read) {
> +       dict_set(fd->ctx, "first_read", data_from_int32(i));
> +   }
> +
>     if (i == child_count) {
>       STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL);
>     } else {
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux