Hi,
Some parts of this topic has been discussed in the recent past here [1]
The current mechanism of each xlator encoding the subvol in the lower or
higher bits has its pitfalls as discussed in the threads and in this
review, here [2]
Here is a solution design from the one of the comments posted on this by
Avati here, [3], as in,
"One example approach (not necessarily the best): Make every xlator
knows the total number of leaf xlators (protocol/clients), and also the
number of all leaf xlators from each of its subvolumes. This way, the
protocol/client xlators (alone) do the encoding, by knowing its global
brick# and total #of bricks. The cluster xlators blindly forward the
readdir_cbk without any further transformations of the d_offs, and also
route the next readdir(old_doff) request to the appropriate subvolume
based on the weighted graph (of counts of protocol/clients in the
subtrees) till it reaches the right protocol/client to resume the
enumeration."
So the current proposed scheme that is being worked on is as follows,
- encode the d_off with the client/protocol ID, which is generated as
its leaf position/number
- no further encoding in any other xlator
- on receiving further readdir requests with the d_off, consult the,
graph/or immediate children, on ID encoded in the d_off, and send the
request down that subvol path
IOW, given a d_off and a common routine, pass the d_off with this (i.e
current xlator) to get a subvol that the d_off belongs to. This routine
would decode the d_off for the leaf ID as encoded in the client/protocol
layer, and match its subvol relative to this and send that for further
processing. (it may consult the graph or store the range of IDs that any
subvol has w.r.t client/protocol and deliver the result appropriately).
Given the current situation of ext4 and xfs, and continuing with the ID
encoding scheme, this seems to be the best manner of preventing multiple
encoding of subvol stomping on each other, and also preserving (in a
sense) further loss of bits. This scheme would also give AFR/EC the
ability to load balance readdir requests across its subvols better, than
have a static subvol to send to for a longer duration.
Thoughts/comments?
Shyam
[1] https://www.mail-archive.com/gluster-devel@xxxxxxxxxxx/msg02834.html
[2] review.gluster.org/#/c/8201/4/xlators/cluster/afr/src/afr-dir-read.c
[3] https://www.mail-archive.com/gluster-devel@xxxxxxxxxxx/msg02847.html
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel