On 1/13/21 1:44 AM, William Brown wrote:
Hey there,
https://github.com/389ds/389-ds-base/pull/4525/files
I had a look and I can see a few possible contributing factors, but without a core and the exact state I can't be sure if this is correct. It's all just hypothetical from reading the code.
The crash is in deref_do_deref_attr() which is called as part of deref_pre_entry(). This is the SLAPI_PLUGIN_PRE_ENTRY_FN which is called by "./ldap/servers/slapd/result.c:1488: rc = plugin_call_plugins(pb, SLAPI_PLUGIN_PRE_ENTRY_FN);"
I think what's important here is that the search is conducted in ./ldap/servers/slapd/opshared.c:818 rc = (*be->be_search)(pb); Is *not* in a transaction. That means that while the single search in be_search() is consistent due to an implied transaction, the subsequent search in deref_pre_entry() is likely conducted in a seperate transaction. This allows for other operations to potentially interleave and cause changes - modrdn or delete would certainly be candidates to cause a DN to be remove between these two points. It would be extremely hard to reproduce as a race condition of course.
Hi William, Pierre,
Thanks for your feedback. I realize how complex it is to think to a
possible explanation and I really appreciate.
I am still missing some parts to understand how it happened.
In the current crash there was no transaction at all "protecting" the
initial search or nested searches. So yes we can imagine the entry got
deleted between the base lookup and candidate list build but it is not
related to txn.
Note that the logs do not contain direct delete of the entry.
Also during base search, the base entry is lookup. It was successful
else it would have return a search failure. In such case the candidate
list is not empty, it contains the base search entry ID (e->ep_id).
Finally, the candidates are evaluated against the filter
(objectclass=*). It could be that phase that is failing if the entry was
cleared from the entry cache and ep_id lookup failed.
regards
thierry
A question you asked is why don't we get a "no such entry" error or similar? I think that this is because build_candidate_list in ldbm_search.c doesn't actually create an error if the base_candidates list is empty, because an IDL is allocated with a value of 0 (no matching entries). this allows the search to proceed, and there are no errors, and the result set is set to NULL with size 0. I can't see where LDAP_NO_SUCH_OBJECT is set in this process, but without looking further into it, my suspicion is that entries of size 0 WONT return an error condition to internal_search_pb, so it's valid for this to be empty.
Anyway, again, this is just reading the code for 20 minutes, and is not a complete in depth investigation, but maybe it's some ideas about what happened?
Hope it helps :)
—
Sincerely,
William Brown
Senior Software Engineer, 389 Directory Server
SUSE Labs, Australia
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/389-devel@xxxxxxxxxxxxxxxxxxxxxxx