Re: [TDF Community] [Board Discuss] LibreOffice - peer2peer collaboration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Moving the follow-up here, as Thorsten suggested.

So, I'm arguing that we can side-step the need for "discovery" in the
P2P sense. Or even more fundamentally: P2P has several meanings. There
is "peer-to-peer networking", where nodes build an ad-hoc network in
which they need such peer discovery; and there is "peer to peer" on top
of an existing network in which, as the opposite of "through a central
server". I believe our interest should be in the second kind, as I'll
elaborate below.

On 04/05/2024 13:05, Thorsten Behrens wrote:
How to connect two or more individuals? It requires routing

I would opt for a simple protocol which does not take on problems
more complex than it has to, at least for a preliminary
implementation. Specifically - the Internet routes for us.

Connecting people for peer2peer communication is conventionally called
discovery.

There's ample writeups on the topic, here's two pointers:

* quite an accessible summary: https://jsantell.com/p2p-peer-discovery/
* scientific survey paper: https://www.inets.rwth-aachen.de/wp-content/uploads/2022/07/service_discovery_survey.pdf


Anyway - a server would not necessarily be required. That is, P2P
connection will happen when two users want to connect; but in our
case, they have already "connected" in some other way to agree on
making the P2P connection. So I suggest, in light of my previous
point, that we assume that the two (or more) users have another,
independent means of communication over which they can send some
data for bootstrapping the LibreOffice P2P. And this could be made
easy, UI-wise, so that the user just needs to press a copy button,
and paste some string so that the other user can see it. The other
user copies the string and pastes it into an appropriate area in
their own running LO instance. Then the connection is set up.

So in a word, piggy-backing on another, existing communication
channel?

Sort of, but not exactly. Think about a (non-Internet) phone
conversation. It's a "peer-to-peer" protocol, in the sense that there
isn't some "conversation server" which is active while the two
conversants converse: You dial the other person's number and connect to
them. But this is on top of an established network, which routes things
for you, without you having to "discover" anybody or anything. Also -
the phone number is apriori magic, which in ad-hoc peer-to-peer networks
you typically don't have. You need to know the person or organization
you're calling and obtain a numeric destination handle - not through the
use of your phone for the actual phone call. Maybe you've met and
exchanged numbers, maybe you got a card or saw an ad etc.

So, the "piggy-backing" would the interpersonal/social communication
which made the two (or more) people want to collaboratively edit a
document in the first place, plus the regular IP protocol and its
network structure.


What kind of data? Basically, I assume that would be a tuple of (IP,
port number, public key). I will admit that this doesn't cater to
the case of two firewalled users; that's a situation I'm not
experienced enough in handling, but I do know there are [many
approaches](https://en.wikipedia.org/wiki/NAT_traversal) (Wikipedia)
to handling it. Some may require a third-party "switching server",
some may not. But such a server can probably be very minimal and
hopefully not even aware of what protocol it's being used to allow
connections for.

Or taking the idea one step further: re-using the other, existing
comms channel, also for all of the collaboration traffic!

Well, I'm not sure it would be beneficial to create a strong tie-in
between our comm protocol and a specific choice of how the two users
communicate otherwise. But - that is a possibility. It may make things
easier technically I suppose - especially when it comes to more-than-two
collaborators.


Heiko wrote:
What could be achievable on TDF infrastructure?

Given what I've said above - let's try to make this completely
independent of TDF infrastructure. Either with no switching-server
at all or with something minimal that hopefully might not even need
TDF continuously maintaining a server. Note that maintenance by us
also has privacy implications, much more so than third-party-less
P2P.

Yup. At any rate, requiring any kind of centralized server
infrastructure has inevitable scalability challenges. It would still
be useful if TDF could help with bootstrapping whatever server
infrastructure will be needed, though.

Agreed; let's just think of such a bootstrapping as something to avoid
if possible, even at the cost of some tradeoffs. This relates
specifically to my two points of motivation for this feature (which I'll
quote for those who follow this list but not forum): This

"1.    Undermines the paradigm Microsoft, Google and others are pushing,
of your work as a user going through them, visible and data-mine-able to
them, requiring connectivity to their servers…
 2.    Offers potential for more easily switching between simple
on-your-computer document and collaboratively-edited documented: Instead
of uploading, inviting, controlling access, etc - it would be just
clicking a “connect and edit with” button, entering an address, and when
that session is over - not needing to do anything but click a button or
close LibreOffice, and none is the wiser."


Heiko wrote:
Isn’t it better to share UNO commands and parameters?

Mmm... maybe... but - what about showing the other party's cursor
and mouse movements? You can't do that with UNO commands.

Starting off from Collabora Online - which is a production-ready
implementation of LibreOffice collaboration, that uses both low-level
key & mouse events, as well as UNO commands - I guess the answer is
'both'? ;)

But is that part of the code of COOL under an appropriate license?


Heiko wrote:
How do we solve the situation when one participant enters text and
another deletes the same paragraph?

It doesn't have to be a great solution, as long as it is
consistent. i.e. if users know that two people on a laggy connection
editing the same sentence is likely to get them making changes in
wrong positions etc., they will naturally limit the extent to which
they do this - like we know from Etherpad. Consistency of behavior
and "principle of least astonishment" would be more important than
perfect coordination/synchronization of inputs.

With a dedicated server, you don't even have that problem. All input
will get serialized through this instance, so there's a strict
temporal ordering for all edits. Whatever package reaches the server
first, will 'win' in an edit war. A fully distributed solution (which
is way harder to implement!) has no such strict global ordering per
se, but there's algorithms such as CRDTs[2], which guarantee eventual
consistency in all peers. But you're right, the Etherpad experience
shows that under bad network connectivity, user experience will start
to suffer. For example, all CRDTs I've looked at would always have a
'delete' operation win over other edits, on the same span of text.

And I would say this is good enough. If our protocol was for an online
competitive game, then minimizing this would be a major priority. For
us, it's barely a minor priority, as the practice of editing literally
the same stretch of text at the exact same time is obviously something
to be avoided. Either one person edits and others look and comment, or
multiple people edit different places in the document.

Heiko wrote:
Encryption and data integrity is key.

Perhaps TLS if it's a TCP-based protocol, and DTLS if it's UDP?
Using the exchanged public keys I mentioned before?

Evidently. Or further re-use of existing p2p/chat solutions.

Of course, integrity is somewhat relative here - if you invite the
world to co-edit your document. ;)

Well, we could have two modes:

1. Whoever has the key/access-tuple can join the editing process.
2. Whoever has the key/access-tuple can ask to join the editing process,
and one (or some, or all) of the participants need to approve.

That would be a bit like in Zoom, where some sessions have a "waiting
room" and some are joined automatically using the invite session ID.

Eyal




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux