On Sat, Aug 13, 2016 at 09:04:32AM +0000, Eric Wong wrote: > > Is there an easy way to get _just_ the list of message-ids you are > > storing (I know I can download the whole archive, but it's big)? > > XHDR (or HDR) over NNTP should do it (that's how I checked > against gmane): > --------8<----- > use Net::NNTP; > my $nntp = Net::NNTP->new($ENV{NNTPSERVER} || 'news.public-inbox.org'); > my ($num, $first, $last) = $nntp->group('inbox.comp.version-control.git'); > my $batch = 10000; > my $i; > for ($i = $first; $i < $last; $i += $batch) { > my $j = $i + $batch - 1; > $j = $last if $j > $last; > my $num2mid = $nntp->xhdr('Message-ID', "$i-$j"); > for my $n ($i..$j) { > defined(my $mid = $num2mid->{$n}) or next; > print "$mid\n"; > } > } Thanks, that's perfect. I collected the message-ids from my archive. Interestingly, I had a dozen or so that did not have message-ids at all. I think most of them are from patches that put the "From " line in the body, like this one: http://public-inbox.org/git/20070311033833.GB10781@xxxxxxxxxxx/ and then they got corrupted on a round-trip through one of the bad mbox formats (probably downloading from gmane, I'd guess; the export there uses mbox, and I use maildir myself, so it probably got split badly years ago). Anyway, public-inbox seems to get this case right, which is good. I had several hundred message ids that you didn't. About half of them were spam or other junk. I weeded them out manually (mostly by picking through the subjects, so possibly there's some error). The end result is 279 messages that I think are legitimate that you don't have. I'll send them to you off-list, as the mbox is about 300K, which the list will reject. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html