On 10/29/18 4:15 PM, Susant Palai wrote:
I would be interested to know if you can use leases/delegations to solve
the issue. If you can not, can leases/delegations be extended instead of
proposing an new API?
From what I understand Block-D keeps all the file open bfore beginning
of the session (exporting file as block devices). Which I guess won't
work with lease, since
lease I guess(please correct me if wrong) breaks the existing lease on
an open request. Certainly, with selfheal daemon the lease will be
released. Hence, mandatory lock fits here IMO.
Right. Leases are mostly useful when there is data caching needed for a
single write or multiple-readers case. Here IIUC, the issue being solved
is to avoid data corruption post failover/failback of the switch.
@Kalever, Prasanna <mailto:pkalever@xxxxxxxxxx> Please give your
feedback here.
From theory, the high-available NFS-Ganesha and Samba services should
have solved similar problems already.
From what I understand the multipath layer does not have any control
over restarting tcmu-runner on Gluster side (If that is how NFS_Ganesha
and Samba provides blacklist for it's clients).
The targetcli does certain tasks only on failover switch which would be
like taking mandatory lock, open a session as mentioned in the design
doc. Hence, no control over data cached at Gluster-client layer to be
replayed in the event of a disconnection.
NFS servers solve this by putting servers into grace and allowing
clients to reclaim their lost state post failover and failback.
Internally since NFS-ganesha stacks on top of gfapi, it as well would
need reclaim lock support in gfapi to acquire lost state from another
NFS server (but certainly not the way current implementation is being
done [1]). I had left the comments in the patch. The current approach
shall make all gfapi applications vulnerable and as Amar mentioned, it
could lead to potential CVE.
To solve it, gluster-block could agree upon some common lk-owner (unique
to that initiator) and that way gfapi need not fetch it and can prevent
other non-trusted clients from acquiring that lock by force.
Coming to other problem quoted in the design doc - replaying fops by
gfapi clientA (nodeA) after nodeB relinquishes lock, I have couple of
questions regarding the same on who shall be responsible for replaying
those fops. - commented on the doc
tcmu-runner->gfapi_xlator->...->protocol/client
Once gfapi_client/nodeA disconnects and reconnects, and if tcmu-runner
replays the fop, wouldn't it need to reopen the fd as there was network
disconnect and old fd had gone stale. If it reopens fd, it will get new
generation no./epoch time and will allow it replay old pending fops right?
Thanks,
Soumya
[1] https://review.gluster.org/#/c/glusterfs/+/21457/
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel