Re: sub-tree synchronization/watching: persistent search questions

Rich Megginson <rmeggins@xxxxxxxxxx> · Fri, 07 Jun 2013 10:12:48 -0600

On 06/07/2013 09:57 AM, Petr Spacek wrote:
On 7.6.2013 16:51, Rich Megginson wrote:
On 06/07/2013 08:44 AM, Petr Spacek wrote:
On 7.6.2013 16:11, Rich Megginson wrote:
On 06/07/2013 05:42 AM, Petr Spacek wrote:
I would like to get opinions from 389 gurus to following problem.

I have an application (DNS server), which needs to read content of 
whole one
sub-tree (cn=dns, dc=test) and keep it synchronized.

The work flow is:
1) Application (DNS server) starts
2) Application reads all existing data out from the sub-tree
3) Application does /something/ with the existing data and starts 
replying
to application clients
4) Sub-tree has to be kept in sync with LDAP server, i.e. updates 
from LDAP
server should be incrementally applied to the 'state' inside the 
application

The problem with persistent search is that it doesn't offer any 
reliable
'signal' that step (2) ended. The search is just running for 
infinite time
and I can't find any signal that all existing entries were read 
already and
now the application will get only Entry Change Notifications.

Basically, I'm looking for something like LDAP syncRepl in 
refreshAndPersist
mode with no cookie (RFC 4533 section 1.3.2 and section 3.4).

I know that Entry Change Notification from persistent search 
contains bit
field which denotes if the entry was 
added/modified/deleted/nothing (i.e.
not modified, just read). Unfortunately, this bit field can't be 
used for
*reliable* detection that all existing entries were read.

Could this 'hack' work reliably?
1) Start persistent search (in separate application thread), but 
suspend
result processing.
2) In another application thread, do the normal sub-tree search on 
the same
sub-tree. Normal search will be started *after* the persistent 
search.
3) Process all results from normal search first
4) Do /something application specific/
5) Start processing updates from persistent search

In my application I can cope with duplicates, when 'normal' search 
returned
entry cn=xyz and the persistent search returned the same entry 
cn=xyz again.

Could you use entryUSN?  For example - keep searching until the 
entryUSN in
the entry is the same as the global entryUSN, then fallback to 
persistent
search?

Could you elaborate it a bit more, please? I'm not sure if I 
understood.
What exactly 'global' entryUSN means?
Do you mean 'lastUSN' value on particular server?
Yes.
Can it work on server where modification are scarce? (Note that I do
sub-tree search on subset of the whole database.)
Not sure what you mean.  What difference does it make if 
modifications are
scarce?  By modifications do you mean adds/mods/modrdn/delete - that 
is, any
update?

I need to operate on one sub-tree in the database, not the whole 
database. I think that for this reason I can't depend on fact that 
sub-tree search will encounter entryUSN == lastUSN.

This will never happen if 'my' sub-tree wasn't modified as the last 
part of sub-tree, right? (That is why spoke about 'scarce' updates, 
and yes, update = any modification in given sub-tree.)

Did I misunderstand something?

No, I see what you mean.

I considered normal search followed by persistent search with entryUSN
filter, but IMHO it will not work with entry deletion.

For example:
1) Start normal search and request entryUSN attribute (among others)
2) Process all results from search and compute max(entryUSN)
3) Start persistent search with filter (entryUSN > computedMaxValue)

I can see the race condition if an entry is deleted between steps 
(2) and (3).

That is exactly what I tried to solve with 'parallel' searches, i.e.
effectively avoid any time gap between steps (2) and (3).

I'm not sure what difference it makes if the update is a deletion or 
not, but
yes, there is a race condition.

Of course, I could read entryUSN during normal and persistent search 
and
then skip all results from persistent search with entryUSN <
computedMaxValue. Is that what you meant?

Yes.

Anyway, do you think that the approach with 'normal & persistent 
searches in parallel' is enough to avoid the race condition? I.e. Does 
it prevent me from missing any update? (Let's suppose that 
duplicate-detection is solved :-))

I think so - or at least, I don't see any other way to do this, short of 
the full syncrepl.

I can see another option:
To implement 389 plugin which will provide (very partial) support 
for RFC
4533. The idea is to implement only state-less pieces (no cookies) 
and
return some error when client attempts to use a cookie.

This would also likely use entryUSN for the cookie, internallly.
Yes, that was also my idea, but I don't want to implement the 
'state-full
part' of the RFC in all it's complexity. Now I'm interested only in
detection that all existing entries were read :-)

Sure, but it would be nice to implement the whole syncrepl protocol 
if you're
going to have to implement it partially anyway.
I definitely agree, but unfortunately, I'm tasked with something 
different and this syncRepl episode is only the small piece of the 
whole story :-)

Sure, but this might be enough motivation for the core 389 team to pick 
and finish syncrepl based on what you started.

Could somebody judge how difficult it can be? From my (naive) 
point of view
are state-less parts of RFC 4533 only 'persistent search 
encapsulated in
another LDAP controls'.

--
389 users mailing list
389-users@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-users