Re: A note from the maintainer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 13, 2016 at 09:04:32AM +0000, Eric Wong wrote:

> > Is there an easy way to get _just_ the list of message-ids you are
> > storing (I know I can download the whole archive, but it's big)?
> 
> XHDR (or HDR) over NNTP should do it (that's how I checked
> against gmane):
> --------8<-----
> use Net::NNTP;
> my $nntp = Net::NNTP->new($ENV{NNTPSERVER} || 'news.public-inbox.org');
> my ($num, $first, $last) = $nntp->group('inbox.comp.version-control.git');
> my $batch = 10000;
> my $i;
> for ($i = $first; $i < $last; $i += $batch) {
> 	my $j = $i + $batch - 1;
> 	$j = $last if $j > $last;
> 	my $num2mid = $nntp->xhdr('Message-ID', "$i-$j");
> 	for my $n ($i..$j) {
> 		defined(my $mid = $num2mid->{$n}) or next;
> 		print "$mid\n";
> 	}
> }

Thanks, that's perfect.

I collected the message-ids from my archive. Interestingly, I had a
dozen or so that did not have message-ids at all. I think most of them
are from patches that put the "From " line in the body, like this one:

  http://public-inbox.org/git/20070311033833.GB10781@xxxxxxxxxxx/

and then they got corrupted on a round-trip through one of the bad mbox
formats (probably downloading from gmane, I'd guess; the export there
uses mbox, and I use maildir myself, so it probably got split badly
years ago). Anyway, public-inbox seems to get this case right, which is
good.

I had several hundred message ids that you didn't. About half of them
were spam or other junk. I weeded them out manually (mostly by picking
through the subjects, so possibly there's some error). The end result is
279 messages that I think are legitimate that you don't have.

I'll send them to you off-list, as the mbox is about 300K, which the
list will reject.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]