Possible SASL bug seen with long-ish ldapmodify updates

Dameon Wagner <dameon.wagner@xxxxxxxxxxx> · Tue, 6 Jan 2015 23:15:22 +0000

Dear Cyrus-SASL folk,

I've been trying to solve an issue we're seeing with large batched
ldapmodify update, and with the help of the kind people over at
openldap-technical it seems that it may be more of a SASL issue than
an openldap issue.  Given that, I thought I'd ask here too, in case it
rings bells with anyone on the list.

I'll paste (almost verbatim) the useful bits from my original post [0],
so please accept my apology if you subscribe to both lists and are
reading my question for a second time.

We use OpenLDAP for a few directories, one of which is in the process
of being migrated to newer hardware, with OS upgrade thrown in, and
I've noticed an issue with ldapmodify that I thought was worth
reporting.  The directory in question has some scripted tooling around
it to manage updates from a number of sources, which are staged in a
Postgresql database before having some LDIF generated to update the
directory itself.

During the course of my testing (we've not seen this in production,
thankfully) I've noticed that, with reasonably lengthy* updates for an
entry, ldapmodify dies with an error like the following:

#---8<--- Command Output ----------------------------------------------
modifying entry "<DN-FOR-FAILED-ENTRY>"
ldap_result: Can't contact LDAP server (-1)
#---8<-----------------------------------------------------------------

(*lengthy, for the example here, meaning 9799 lines of LDIF input for
a single updated entry, totalling 658K of changes)

There are matching log-entries in the system's syslog (timestamp,
hostname and PID trimmed off to save some linewrapping) and slapd logs
(we run slapd under daemontools supervision, and capture it's
stdout/stderr):

#---8<---- SysLog Output ----------------------------------------------
local4.debug slapd: conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>"
local4.debug slapd: conn=1002 op=3158 MOD attr=member
local4.debug slapd: conn=1002 op=3158 RESULT tag=103 err=0 text=
local4.debug slapd: conn=1002 fd=13 closed (connection lost)
#---8<-----------------------------------------------------------------

#---8<---- Slapd Output -----------------------------------------------
54a2e47f conn=1002 op=3158 MOD dn="<DN-OF-LAST-GOOD-ENTRY>"
54a2e47f conn=1002 op=3158 MOD attr=member
54a2e47f conn=1002 op=3158 RESULT tag=103 err=0 text=
sb_sasl_cyrus_decode: failed to decode packet: generic failure
sb_sasl_generic_read: failed to decode packet
54a2e47f conn=1002 fd=13 closed (connection lost)
#---8<-----------------------------------------------------------------

The LDIF for the failed entry consists of:

#---8<-----------------------------------------------------------------
dn: <DN-FOR-FAILED-ENTRY>
changetype: modify
replace: member
member: <DN-FOR-MEMBER>
...
member: <DN-FOR-ANOTHER-MEMBER>
#---8<-----------------------------------------------------------------

where the list of members was, in this case, 9799 long.  The LDIF
itself is 30097 lines long, and was happy for the first ~15000 lines.

If I prune out the modifications for the troublesome DN, the remainder
of the file also goes through happily.

As a work-around I can manually split up the list into several blocks
(tested with roughly 1000 member updates per block) with "replace:
member" for the first, to match the current behaviour, and "add:
member" for the rest. In this format, ldapmodify is happy to process
the LDIF (all in one connection, but as discreet operations totalling
about 1000 lines of updates or roughly 67K of changes).  (Note: the
authenticated user has "unlimited" limits in the config.)

We're using Debian wheezy, with the authenticating with Kerberos using
the libsasl2-modules-gssapi-heimdal Debian package of version
2.1.25.dfsg1-6+deb7u1.  On the older system we're in upgrading away
from we're using the lenny version of the package
(2.1.22.dfsg1-23+lenny1 -- hence planning the upgrade).

Has anyone on the list seen anything similar to the
"sb_sasl_cyrus_decode" and "sb_sasl_generic_read" failures in
situation like this?

Thanks for your time and any hints, tips or pointers.

Cheers.

Dameon.

[0](http://www.openldap.org/lists/openldap-technical/201501/msg00006.html)

-- 
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
Dameon Wagner, Systems Development and Support Team
IT Services, University of Oxford
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><