Hi, If I understand what you're saying, you effectively want to tie a specific file to a specific node? The proposed patch (at the very least - in concept) might work well for large files .. but if you tie files to nodes, a large file would not gain any of the benefit of these striped reads ... ??? ----- Original Message ----- From: "Krishna Srinivas" <krishna@xxxxxxxxxxxxx> To: "Csibra Gergo" <gergo@xxxxxxxxx> Cc: gluster-devel@xxxxxxxxxx Sent: Monday, December 31, 2007 6:14:24 PM (GMT) Europe/London Subject: Re: Patch for "Striped" read from AFR volumes Hi Csibra, The patch contribution is really appreciated. I did not verify the correctness of the code but I can make out that you are doing RR of readv(). But making read()s round-robin will decrease the performance (theoritically) as we wont be taking advantage of read ahead algorithm of the kernel. The better approach would be to make a file to be read from the same child everytime (even on the next open) but make different files to be read from different children. A good way of deciding the child to be read from is by (inode_number % child_count), this change is in the TLA repository. Could you test how your patch performs against the TLA source? check doc/translator-option.txt for the options of AFR (option read-subvolume) A better way to define striped reads would be: if a read request comes for 1MB, get 0.5 MB from first child and 0.5MB from second child and combine the reads. However this way also we are not sure about the performance gain. Thanks Krishna On Dec 31, 2007 9:44 PM, Csibra Gergo <gergo@xxxxxxxxx> wrote: > Hi, > > apply following patch, to read AFR volumes like RAID0 volumes. The > current implementation of AFR reads every blocks from the first child > if that available. With this simple patch cycles through all available > childs. This meand every afr_readv calls reads from the next child > readed as previous call. So if U have 4 child, first block will be > readed from 1st next from 2nd next from 3rd next from 4th and starts > from first so next from 1st. > > to apply this patch > cd xlators/cluster/afr/src > patch -p0 <afr_striped_read_1.3.7.diff > make > make install > > patch also available here: > http://www.csibra.hu/glusterfs/afr_striped_read_1.3.7.diff > > as you see this patch against 1.3.7 version. > > here's the patch: > >>>>CUT HERE<<<< > *** /root/afr.c 2007-10-17 17:40:37.000000000 +0200 > --- afr.c 2007-12-31 16:51:38.000000000 +0100 > *************** > *** 2448,2453 **** > --- 2448,2469 ---- > if (afrfdp->fdstate[i]) > break; > } > + if(i == pvt->child_count) { > + // if we reached the last child, test if maybe there're unreaded child > + data_t *fr = dict_get(local->fd->ctx, "first_read"); > + if(fr) { > + int32_t frd = data_to_int32(fr); > + // frd contains the first child what readed > + if(frd > 0) { > + // if first readed child was not the first physical child, start child search again > + i = 0; > + for (; i < pvt->child_count; i++) { > + if (afrfdp->fdstate[i]) > + break; > + } > + } > + } > + } > if (i < pvt->child_count) { > STACK_WIND (frame, > afr_readv_cbk, > *************** > *** 2492,2501 **** > local->size = size; > local->fd = fd; > > ! for (i = 0; i < child_count; i++) { > if (afrfdp->fdstate[i] && pvt->state[i]) > break; > } > if (i == child_count) { > STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL); > } else { > --- 2508,2548 ---- > local->size = size; > local->fd = fd; > > ! int32_t next_child, first_read = 0; > ! data_t *nxtc = dict_get(fd->ctx, "next_child"); > ! if(nxtc) { > ! next_child = data_to_int32(nxtc); > ! } else { > ! next_child = -1; > ! first_read = 1; > ! } > ! next_child++; > ! if(next_child == child_count) { > ! next_child = 0; > ! } > ! > ! for (i = next_child; i < child_count; i++) { > if (afrfdp->fdstate[i] && pvt->state[i]) > break; > } > + > + if(i == child_count) { > + i = 0; > + for (i = 0; i < child_count; i++) { > + if (afrfdp->fdstate[i] && pvt->state[i]) > + break; > + } > + if(i == child_count) { > + next_child = 0; > + } else { > + next_child = i; > + } > + } > + dict_set(fd->ctx, "next_child", data_from_int32(next_child)); > + if(first_read) { > + dict_set(fd->ctx, "first_read", data_from_int32(i)); > + } > + > if (i == child_count) { > STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL); > } else { > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx http://lists.nongnu.org/mailman/listinfo/gluster-devel