Re: bug in 389-ds-base-1.2.11.15-91.el6_9.x86_64 with large paged searches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 10/26/2017 12:16 PM, Jeffrey Eaton wrote:
I recently updated to 389-ds-base-1.2.11.15-91.el6_9.x86_64 (from 389-ds-base-1.2.11.15-50.el6_6.x86_64) and have uncovered a bug with large paged searches blocking all other searches on the server.  I tried to access https://pagure.io/389-ds-base/new_issue to open a bug there, but its giving a 500 error.
Its up and running now, so if you don't mind please file a ticket.

But... 1.2.11 is no longer under development - meaning no new releases are being done. 

So we will see if we can reproduce this on 1.3.6, and fix it there if it's still an issue.

Tickets of interest between 1.2.11.15-91  <-->   1.2.11.15-50:

1.2.11.15-75 --- Paged results search returns the blank list of entries (DS 48808)
1.2.11.15-71 --- SimplePagedResults -- in the search error case, simple paged results slot was not released (DS 48375)
1.2.11.15-69 --- SimplePagedResults -- abandon could happen between the abandon check and sending results -- Fixing a regression introduced in 1.2.11.15-68 (DS 48338)
1.2.11.15-68 --- SimplePagedResults -- abandon could happen between the abandon check and sending results (DS 48338)
1.2.11.15-64 --- pagedresults - when timed out, search results could have been already freed. (DS 48299)


Regards,
Mark

The case I've discovered is as follows:

I have a search which returns about 28,000 entries.  The system which generates this search does a paged search with 1000 results per page.  I can replicate the behavior with a command line ldap search like the following

# ldapsearch -H ldaps://servername:636 -D 'uid=binddn,ou=specials,dc=cmu,dc=edu' -x -W -E pr=1000/noprompt "(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))",

As soon as the first 1000 entries are returned, and the search requests the second page, all other searches being done to the same server block.

Specifically, I run a loop doing a trivial search:

# while true; do time ldapsearch -h servername -x uid=jeaton; sleep 1 ; done

This search will return in a fraction of a second:

real 0m0.031s
user 0m0.009s
sys 0m0.006s

until the server starts working on the second (and any subsequent pages).  When that happens, the trivial search blocks for the entire time that it takes to generate the 1000 entries for the page:

real 0m45.907s
user 0m0.009s
sys 0m0.008s

everything looks good in the access log at first:

[26/Oct/2017:10:10:02 -0400] conn=202655 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:02 -0400] conn=202655 op=0 BIND dn="" method=128 version=3 
[26/Oct/2017:10:10:02 -0400] conn=202655 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:02 -0400] conn=202655 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:02 -0400] conn=202655 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:02 -0400] conn=202655 op=2 UNBIND
[26/Oct/2017:10:10:02 -0400] conn=202655 op=2 fd=85 closed - U1
[26/Oct/2017:10:10:03 -0400] conn=202656 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:04 -0400] conn=202656 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:04 -0400] conn=202656 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:04 -0400] conn=202656 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:04 -0400] conn=202656 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:04 -0400] conn=202656 op=2 UNBIND
[26/Oct/2017:10:10:04 -0400] conn=202656 op=2 fd=85 closed - U1
[26/Oct/2017:10:10:04 -0400] conn=202657 fd=85 slot=85 SSL connection from 10.0.0.2 to 10.1.1.1
[26/Oct/2017:10:10:04 -0400] conn=202657 TLS1.0 256-bit AES
[26/Oct/2017:10:10:04 -0400] conn=202657 op=0 BIND dn="uid=binddn,ou=specials,dc=cmu,dc=edu" method=128 version=3
[26/Oct/2017:10:10:04 -0400] conn=202657 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn="uid=binddn,ou=specials,dc=cmu,dc=edu"
[26/Oct/2017:10:10:04 -0400] conn=202657 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:05 -0400] conn=202658 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:05 -0400] conn=202658 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:05 -0400] conn=202658 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:05 -0400] conn=202658 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:05 -0400] conn=202658 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:05 -0400] conn=202658 op=2 UNBIND
[26/Oct/2017:10:10:05 -0400] conn=202658 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:06 -0400] conn=202659 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:06 -0400] conn=202659 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:06 -0400] conn=202659 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:06 -0400] conn=202659 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:06 -0400] conn=202659 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:06 -0400] conn=202659 op=2 UNBIND
[26/Oct/2017:10:10:06 -0400] conn=202659 op=2 fd=86 closed - U1


until we get to the point where the first page is returned and the client requests the second page of results:


[26/Oct/2017:10:10:46 -0400] conn=202697 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:46 -0400] conn=202697 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:46 -0400] conn=202697 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:46 -0400] conn=202697 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:46 -0400] conn=202697 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:46 -0400] conn=202697 op=2 UNBIND
[26/Oct/2017:10:10:46 -0400] conn=202697 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:47 -0400] conn=202698 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:10:47 -0400] conn=202698 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:10:47 -0400] conn=202698 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 UNBIND
[26/Oct/2017:10:10:47 -0400] conn=202698 op=2 fd=86 closed - U1
[26/Oct/2017:10:10:48 -0400] conn=202657 op=1 RESULT err=0 tag=101 nentries=1000 etime=44 notes=P pr_idx=0
[26/Oct/2017:10:10:48 -0400] conn=202657 op=2 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:10:48 -0400] conn=202699 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:11:33 -0400] conn=202657 op=2 RESULT err=0 tag=101 nentries=1000 etime=45 notes=P pr_idx=0
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:11:33 -0400] conn=202699 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:11:33 -0400] conn=202657 op=3 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:11:33 -0400] conn=202699 op=2 UNBIND
[26/Oct/2017:10:11:33 -0400] conn=202699 op=2 fd=86 closed - U1
[26/Oct/2017:10:12:14 -0400] conn=202657 op=3 RESULT err=0 tag=101 nentries=1000 etime=41 notes=P pr_idx=0
[26/Oct/2017:10:12:14 -0400] conn=202700 fd=86 slot=86 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:12:14 -0400] conn=202700 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:12:14 -0400] conn=202657 op=4 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:12:14 -0400] conn=202700 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:12:14 -0400] conn=202700 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:12:14 -0400] conn=202700 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:12:14 -0400] conn=202700 op=2 UNBIND
[26/Oct/2017:10:12:14 -0400] conn=202700 op=2 fd=86 closed - U1
[26/Oct/2017:10:12:58 -0400] conn=202657 op=4 RESULT err=0 tag=101 nentries=1000 etime=44 notes=P pr_idx=0


Note the large gap in time between 10:10:48 and 10:11:33.  You can see my client doing the trivial search connect at 10:10:48 with conn=202699.  At that point the client just hangs, waiting on the server's response.  Only after the server responds with the second page of results at 10:11:33, is my client able to send the bind request and search.   This same pattern happens with every further page of searches, eg, 10:11:33 - 10:12:14.

Then when we get to the final page, as soon as that completes, the trivial searches once again complete at one per second:

[26/Oct/2017:10:25:30 -0400] conn=202657 op=28 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:25:30 -0400] conn=202737 op=0 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:25:30 -0400] conn=202738 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:25:30 -0400] conn=202738 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:25:30 -0400] conn=202738 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:25:58 -0400] conn=202738 op=2 UNBIND
[26/Oct/2017:10:25:58 -0400] conn=202740 fd=89 slot=89 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:25:58 -0400] conn=202657 op=28 RESULT err=0 tag=101 nentries=1000 etime=28 notes=P pr_idx=0
[26/Oct/2017:10:25:58 -0400] conn=202737 op=1 UNBIND
[26/Oct/2017:10:25:58 -0400] conn=202737 op=1 fd=87 closed - U1
[26/Oct/2017:10:25:58 -0400] conn=202738 op=2 fd=88 closed - U1
[26/Oct/2017:10:25:58 -0400] conn=202740 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:25:58 -0400] conn=202657 op=29 SRCH base="dc=cmu,dc=edu" scope=2 filter="(&(isMemberOf=cn=GroupName,ou=groups,dc=cmu,dc=edu)(guid=*)(objectClass=inetorgperson))" attrs=ALL
[26/Oct/2017:10:25:58 -0400] conn=202740 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:25:58 -0400] conn=202740 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:25:58 -0400] conn=202740 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:08 -0400] conn=202741 fd=87 slot=87 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:08 -0400] conn=202740 op=2 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202741 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:08 -0400] conn=202741 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:08 -0400] conn=202740 op=2 fd=89 closed - U1
[26/Oct/2017:10:26:08 -0400] conn=202741 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:08 -0400] conn=202657 op=29 RESULT err=0 tag=101 nentries=455 etime=10 notes=P pr_idx=0
[26/Oct/2017:10:26:08 -0400] conn=202741 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:08 -0400] conn=202657 op=30 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202741 op=2 UNBIND
[26/Oct/2017:10:26:08 -0400] conn=202657 op=30 fd=85 closed - U1
[26/Oct/2017:10:26:08 -0400] conn=202741 op=2 fd=87 closed - U1
[26/Oct/2017:10:26:09 -0400] conn=202742 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:09 -0400] conn=202742 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:09 -0400] conn=202742 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:09 -0400] conn=202742 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:09 -0400] conn=202742 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:09 -0400] conn=202742 op=2 UNBIND
[26/Oct/2017:10:26:09 -0400] conn=202742 op=2 fd=85 closed - U1
[26/Oct/2017:10:26:10 -0400] conn=202743 fd=85 slot=85 connection from 10.0.0.1 to 10.1.1.1
[26/Oct/2017:10:26:10 -0400] conn=202743 op=0 BIND dn="" method=128 version=3
[26/Oct/2017:10:26:10 -0400] conn=202743 op=0 RESULT err=0 tag=97 nentries=0 etime=0 dn=""
[26/Oct/2017:10:26:10 -0400] conn=202743 op=1 SRCH base="dc=cmu,dc=edu" scope=2 filter="(uid=jeaton)" attrs=ALL
[26/Oct/2017:10:26:10 -0400] conn=202743 op=1 RESULT err=0 tag=101 nentries=1 etime=0
[26/Oct/2017:10:26:10 -0400] conn=202743 op=2 UNBIND


I did some testing and think that the bug was introduced around 389-ds-base-1.2.11.15-72.el6_7.  I can probably upgrade one by one and find it specifically if desired.




_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
389-devel mailing list -- 389-devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to 389-devel-leave@xxxxxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Fedora Directory Announce]     [Fedora Users]     [Older Fedora Users Mail]     [Fedora Advisory Board]     [Fedora Security]     [Fedora Devel Java]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Mentors]     [Fedora Package Review]     [Fedora Art]     [Fedora Music]     [Fedora Packaging]     [CentOS]     [Fedora SELinux]     [Big List of Linux Books]     [KDE Users]     [Fedora Art]     [Fedora Docs]

  Powered by Linux