Re: Re: AFR Translator have problem

Gareth Bult <gareth@xxxxxxxxxxxxx> · Thu, 17 Jan 2008 00:49:42 +0000 (GMT)

Mmm...

There are a couple of real issues with self heal at the moment that make it a minefield for the inexperienced.

Firstly there's the mount bug .. if you have two servers and two clients, and one AFR, there's a temptation to mount each client against a different server. Which initially works fine .. right up until one of the glusterfsd's ends .. when it still works fine. However, when you restart the failed glusterfsd, one client will erroneously connect to it (or this is my interpretation of the net effect), regardless of the fact that self-heal has not taken place .. and because it's out of sync, doing a "head -c1" on a file you know has changed gets you nowhere. So essentially you need to remount clients against non-crashed servers before starting a crashed server .. which is not nice. (this is a filed bug)

Then we have us poor XEN users who store 100Gb's worth of XEN images on a gluster mount .. which means we can live migrate XEN instances between servers .. which is fantastic. However, after a server config change or a server crash, it means we need to copy 100Gb between the servers .. which wouldn't be so bad if we didn't have to stop and start each XEN instance in order for self heal to register the file as changed .. and while self-heal is re-copying the images, they can't be used, so you're looking as 3-4 mins of downtime per instance.

Apart from that (!) I think gluster is a revolutionary filesystem and will go a long way .. especially if the bug list shrinks .. ;-)

Keep up the good work :)

[incidentally, I now have 3 separate XEN/gluster server stacks, all running live-migrate - it works!]

Regards,
Gareth.

----- Original Message -----
step 3.: "Angel" <clist@xxxxxx>
To: "An. Dinh Nhat" <andn@xxxxxxxxxxxxxxx>
Cc: gluster-devel@xxxxxxxxxx
Sent: 16 January 2008 20:36:52 o'clock (GMT) Europe/London
Subject: Re: AFR Translator have problem

I see, glusterfs developers have this point in mind on the roadmap:

for the 1.4 release roadmap says:

active self-heal - log and replay failed I/O transactions 
brick hot-add/remove/swap - live storage hardware maintenance

so till that day, we the users have to figure out how to force lazy afrs into doing their job :-)

One issue positive is that this way you can control how much resources are devoted to afr, the more you touch files the more replication occurs and
in the event of high net or cpu pressure, lowering touching speed should lower afr requirements. 

Your mileage may vary. :-P

Perhaps GlusterFS client (maybe servers) should talk to a housekeeping daemon to acomplish this tasks instead of over-engineering the code to do as much things as required.. 

Let's wait what developers have to say about this issue...

Regards Angel

El Miércoles, 16 de Enero de 2008 An. Dinh Nhat escribió:
> Thanks your answer.
> 
> I understand when I touch file after server 3 go on when afr issue.Howerver if I have 2 server.After I edit add one server in glusterfs-client.vol, and Mount point have 40000 file,size: 800 Gb.How to AFR replication file to server3 automatic?
> 
> -----Original Message-----
> From: Angel [mailto:clist@xxxxxx] 
> Sent: Wednesday, January 16, 2008 11:16 PM
> To: gluster-devel@xxxxxxxxxx
> Cc: An. Dinh Nhat
> Subject: Re: AFR Translator have problem
> 
> I thinks AFR replication occurs on file access
> 
> try to touch all files from the client and probably you will trigger replication onto server3
> 
> client --> creates files on AFR(server1,server2)
> 
> server 3 goes up now we have AFR(server1,server2,server3)
> 
> you wont see any files on server3
> 
> 
> now touch files from the client, AFR will be triggered
> 
> now you will see touched files on server3
> 
> ive made similar test on local scenarios client --> local AFR(dir1,dir2)
> 
> i copied a file test.pdf to my mountpoint and it got replicated on both 'remote' dirs. Next i deleted one copy on the exported 'remote' directories (dir1)
> After that,  i opened the pdf file on the mount point, it opened right and i could see now dir1 was storing a new copy of test.pdf again.
> 
> it seems for me looking at the code that things mostly occur on file operations because xlator work intercepting fuse calls along the path to posix modules. 
> 
> my tests showed things occurring like this...
> 
> 
> Regards Angel
> El Miércoles, 16 de Enero de 2008 Anand Avati escribió:
> > Dinh,
> >  can you post your spec files, mentioning the order of events in terms of
> > subvolumes?
> > 
> > thanks,
> > avati
> > 
> > ---------- Forwarded message ----------
> > From: An. Dinh Nhat <andn@xxxxxxxxxxxxxxx>
> > Date: 16-ene-2008 16:07
> > Subject: AFR Translator have problem
> > To: gluster-devel-owner@xxxxxxxxxx
> > 
> >  Hi.
> > 
> > I set up 3 server using
> > GlusterFS<http://www.gluster.org/docs/index.php/GlusterFS>.Begin
> > I start 2 server after From Client I mount
> > GlusterFS<http://www.gluster.org/docs/index.php/GlusterFS>and copy 10
> > file on volume
> > gluster.After I start 'server 3' however I don't see any file on 'server
> > 3',I think  AFR Translator have problem.
> > 
> > 
> > 
> > [root@client examples]# glusterfs -V
> > 
> > glusterfs 1.3.7 built on Dec 18 2007
> > 
> > 
> > 
> > 
> > 
> > Thanks & Best Regard,[image: victory]
> > Đinh Nhật An
> > System Engineer
> > 
> > System Operation - Vinagame JSC
> > Email:andn@xxxxxxxxxxxxxxx - Yahoo:atuladn
> > *V*inagame *J*SC - 459B Nguyễn Đình Chiểu. Q3 , HCMC , VietNam
> > 
> > Office phone: 8.328.426 Ext 310
> > 
> > 
> > 
> > 
> > 
> 
> 
> 

-- 
------------------------------------------------
Clist UAH
------------------------------------------------

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel