Hi Csibra, The patch contribution is really appreciated. I did not verify the correctness of the code but I can make out that you are doing RR of readv(). But making read()s round-robin will decrease the performance (theoritically) as we wont be taking advantage of read ahead algorithm of the kernel. The better approach would be to make a file to be read from the same child everytime (even on the next open) but make different files to be read from different children. A good way of deciding the child to be read from is by (inode_number % child_count), this change is in the TLA repository. Could you test how your patch performs against the TLA source? check doc/translator-option.txt for the options of AFR (option read-subvolume) A better way to define striped reads would be: if a read request comes for 1MB, get 0.5 MB from first child and 0.5MB from second child and combine the reads. However this way also we are not sure about the performance gain. Thanks Krishna On Dec 31, 2007 9:44 PM, Csibra Gergo <gergo@xxxxxxxxx> wrote: > Hi, > > apply following patch, to read AFR volumes like RAID0 volumes. The > current implementation of AFR reads every blocks from the first child > if that available. With this simple patch cycles through all available > childs. This meand every afr_readv calls reads from the next child > readed as previous call. So if U have 4 child, first block will be > readed from 1st next from 2nd next from 3rd next from 4th and starts > from first so next from 1st. > > to apply this patch > cd xlators/cluster/afr/src > patch -p0 <afr_striped_read_1.3.7.diff > make > make install > > patch also available here: > http://www.csibra.hu/glusterfs/afr_striped_read_1.3.7.diff > > as you see this patch against 1.3.7 version. > > here's the patch: > >>>>CUT HERE<<<< > *** /root/afr.c 2007-10-17 17:40:37.000000000 +0200 > --- afr.c 2007-12-31 16:51:38.000000000 +0100 > *************** > *** 2448,2453 **** > --- 2448,2469 ---- > if (afrfdp->fdstate[i]) > break; > } > + if(i == pvt->child_count) { > + // if we reached the last child, test if maybe there're unreaded child > + data_t *fr = dict_get(local->fd->ctx, "first_read"); > + if(fr) { > + int32_t frd = data_to_int32(fr); > + // frd contains the first child what readed > + if(frd > 0) { > + // if first readed child was not the first physical child, start child search again > + i = 0; > + for (; i < pvt->child_count; i++) { > + if (afrfdp->fdstate[i]) > + break; > + } > + } > + } > + } > if (i < pvt->child_count) { > STACK_WIND (frame, > afr_readv_cbk, > *************** > *** 2492,2501 **** > local->size = size; > local->fd = fd; > > ! for (i = 0; i < child_count; i++) { > if (afrfdp->fdstate[i] && pvt->state[i]) > break; > } > if (i == child_count) { > STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL); > } else { > --- 2508,2548 ---- > local->size = size; > local->fd = fd; > > ! int32_t next_child, first_read = 0; > ! data_t *nxtc = dict_get(fd->ctx, "next_child"); > ! if(nxtc) { > ! next_child = data_to_int32(nxtc); > ! } else { > ! next_child = -1; > ! first_read = 1; > ! } > ! next_child++; > ! if(next_child == child_count) { > ! next_child = 0; > ! } > ! > ! for (i = next_child; i < child_count; i++) { > if (afrfdp->fdstate[i] && pvt->state[i]) > break; > } > + > + if(i == child_count) { > + i = 0; > + for (i = 0; i < child_count; i++) { > + if (afrfdp->fdstate[i] && pvt->state[i]) > + break; > + } > + if(i == child_count) { > + next_child = 0; > + } else { > + next_child = i; > + } > + } > + dict_set(fd->ctx, "next_child", data_from_int32(next_child)); > + if(first_read) { > + dict_set(fd->ctx, "first_read", data_from_int32(i)); > + } > + > if (i == child_count) { > STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL); > } else { > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >