This could be the problem. When I do this on a 1G file, I have 1 file in each stripe partition of size ~ 1G. I don't get (n) files where n=1G/chunk size ... (!) If I did, I could see how it would work .. but I don't .. Are you saying I "definitely should" see files broken down into multiple sub files, or were you assuming this is how it worked? Gareth. ----- Original Message ----- From: "Kevan Benson" <kbenson@xxxxxxxxxxxxxxx> To: "Gareth Bult" <gareth@xxxxxxxxxxxxx> Cc: "gluster-devel" <gluster-devel@xxxxxxxxxx> Sent: Thursday, December 27, 2007 8:16:53 PM (GMT) Europe/London Subject: Re: Choice of Translator question Gareth Bult wrote: >> Agreed, which is why I just showed the single file self-heal >> method, since in your case targeted self heal (maybe before a full >> filesystem self heal) might be more useful. > > Sorry, I was mixing moans .. on the one hand there's no log hence no > automatic detection of out of date files (which means you need a > manual scan), and secondly, doing a full self-heal on a large > file-system "can" be prohibitively "expensive" ... > > I'm vaguely wondering if it would be possible to have a "log" > translator that wrote changes to a namespace volume for quick > recovery following a node restart. (as an option of course) An interesting thought. Possibly something that keeps a filename and timestamp so other AFR members could connect and request changed file AFR versions since X timestamp. Automatic self-heal is supposed to be on the way, so I suspect they are already doing (or planning) something like this. >> I don't see how the AFR could even be aware the chunks belong to >> the same file, so how it would know to replicate all the chunks of >> a file is a bit of a mystery to me. I will admit I haven't done >> much with the stripe translator though, so my understanding of it's >> operation may wrong. > > Mmm, trouble is there's nothing definitive in the documentation > either way .. I'm wondering whether it's a known critical omission > which is why it's not been documented (!) At the moment stripe is > pretty useless without self-heal (i.e. AFR). AFR is pretty useless > without stripe for anyone with large files. (which I'm guessing is > why stripe was implemented after all the "stripe is bad" > documentation) If the the two don't play well and a self-heal on a > large file means a 1TB network data transfer - this would strike me > as a show stopper. I think the original docs said it was implemented because it was easy, but there wasn't a whole lot to be gained by using it. Since then, I've seen people post numbers that seemed to indicate it gave a somewhat sizable boost, but the extra complexity in introduced never made it attractive to me. The possibility it could be used to greatly speed up self-heal on large files seems like a real good reason to use it though, so hopefully we can find a way to make it work. >> Understood. I'll have to actually try this when I have some time, >> instead of just doing some armchair theorizing. > > Sure .. I think my tests were "proper" .. although I might try them > on TLA just to make sure. > > Just thinking logically for a second, for AFR to do chunk level > self-heal, there must be a chunk level signature store somewhere. ... > where would this be ? Well, to AFR each chunk should just look like another file, it shouldn't care that it's part of a whole. I assume the stripe translator uses another extended attribute to tell what file it's part of. Perhaps the AFR translator is stripe aware and that's causing the problem? >> Was this on AFR over stripe or stripe over AFR? > > Logic told me it must be AFR over stipe, but I tries it both ways > round .. Let get rid of the over/under terminology (which I always seem to think of reverse from other people), and use a representation that's more absolute: client -> XLATOR(stripe) -> XLATOR(AFR) -> diskVol(1..N) Throw in your network connections wherever you want, but this should be testable on a single box with two different directories exported as volumes. The client writes to the stripe translator, which splits up the large file, which is then sent to the AFR translator so each chunk is stored redundantly in each disk volume supplied. If the AFR and stripe are reversed, it will have to pull all stripe chunks to do a self heal (unless AFR is stripe aware), which isn't what we are aiming for. Is that similar to what you tested? -- -Kevan Benson -A-1 Networks