Re: [PATCH 2/2] repair: fix some valgrind reported errors on i686

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 7 Oct 2011 09:15:42 +1100

On Thu, Oct 06, 2011 at 07:17:52AM -0500, Alex Elder wrote:
> On Thu, 2011-10-06 at 12:01 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > Fix a potential prefetch read problem due to the first loop
> > execution of pf_batch_read potentially not initialising the fsbno
> > varaible:
> > 
> > ==10177== Thread 6:
> > ==10177== Conditional jump or move depends on uninitialised value(s)
> > ==10177==    at 0x8079CAB: pf_batch_read (prefetch.c:408)
> > ==10177==    by 0x6A2996D: clone (clone.S:130)
> > ==10177==
> > 
> > Fix a bunch of invalid read/write errors due to excessive blkmap
> > allocations when inode forks are corrupted. These show up some time
> > after making a blkmap allocation for 536870913 extents on i686,
> > which is followed some time later by a crash caused bymemory
> > corruption.
> > 
> > This blkmap allocation size overflows 32 bits in such a
> > way that it results in a 32 byte allocation and so access to the
> > second extent results in access beyond the allocated memory and
> > corrupts random memory.
> > 
> > ==5419== Invalid write of size 4
> > ==5419==    at 0x80507DA: blkmap_set_ext (bmap.c:260)
> > ==5419==    by 0x8055CF4: process_bmbt_reclist_int (dinode.c:712)
> > ==5419==    by 0x8056206: process_bmbt_reclist (dinode.c:813)
> > ==5419==    by 0x80579DA: process_exinode (dinode.c:1324)
> > ==5419==    by 0x8059B77: process_dinode_int (dinode.c:2036)
> > ==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
> > ==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> > ==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
> > ==5419==    by 0xFFF: ???
> > ==5419==  Address 0x944cfb8 is 0 bytes after a block of size 32 alloc'd
> > ==5419==    at 0x48E1102: realloc (in
> > /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
> > ==5419==    by 0x80501F3: blkmap_alloc (bmap.c:56)
> > ==5419==    by 0x80599F5: process_dinode_int (dinode.c:2027)
> > ==5419==    by 0x805ABE6: process_dinode (dinode.c:2823)
> > ==5419==    by 0x8052493: process_inode_chunk.isra.4 (dino_chunks.c:777)
> > ==5419==    by 0x8054012: process_aginodes (dino_chunks.c:1024)
> > ==5419==    by 0xFFF: ???
> > 
> > Add overflow detection code into the blkmap allocation code to avoid
> > this problem, and also free large allocations once they are finished
> > with to avoid pinning large amounts of memory due to the occasional
> > large extent list in a filesystem.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> This is good but I have a few comments below, a couple of
> which really indicate you need to update this.
> 
> 					-Alex
>  
> > ---
> >  repair/bmap.c     |   37 ++++++++++++++++++++++++++++++++++++-
> >  repair/prefetch.c |    2 +-
> >  2 files changed, 37 insertions(+), 2 deletions(-)
> > 
> > diff --git a/repair/bmap.c b/repair/bmap.c
> > index 79b9f79..1127a87 100644
> > --- a/repair/bmap.c
> > +++ b/repair/bmap.c
> > @@ -47,6 +47,17 @@ blkmap_alloc(
> >  	if (nex < 1)
> >  		nex = 1;
> >  
> > +#if (BITS_PER_LONG != 64)
> 
> This should be == 32, not != 64.

OK.

>  (And if it
> were possible, sizeof (int) == 32.)

The BITS_PER_LONG are derived from the output of the configure
process, and that's what the other code uses as well. So I'm just
being consistent ;).

> > +	if (nex > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
> 
> See the comment below about this calculation.
> 
> > +		do_warn(
> > +	_("Number of extents requested in blkmap_alloc (%u) overflows 32 bits.\n"
> > +	  "If this is not a corruption, then will need a 64 bit system\n"
> 		...then you will need...
> 
> > +	  "to repair this filesystem.\n"),
> > +			nex);
> > +		return NULL;
> > +	}
> > +#endif
> > +
> >  	key = whichfork ? ablkmap_key : dblkmap_key;
> >  	blkmap = pthread_getspecific(key);
> >  	if (!blkmap || blkmap->naexts < nex) {
> 
> . . .
> 
> > @@ -218,6 +244,15 @@ blkmap_grow(
> >  	}
> >  
> >  	blkmap->naexts += 4;
> 
> The check needs to go *before* you update naexts.

It's in the right place - adding 4 to the unchecked extent count can
push it over the limit, so we have to check it after adding 4 to it.

> > +#if (BITS_PER_LONG != 64)
> > +	if (blkmap->naexts > (INT_MAX / sizeof(bmap_ext_t) - 1)) {
> 
> I don't really follow this calculation. I would expect
> it to be based more closely on how BLKMAP_SIZE() is
> defined.

> 
> If you move it before the increment I think it would
> be better to use:
> 	if (BLKMAP_SIZE(nex) >= INT_MAX - sizeof (blkent_t *))

We can't use BLKMAP_SIZE(), because it will overflow 32 bits.
Indeed, use of BLKMAP_SIZE() on unchecked extent counts is
*precisely* the cause of the memory corruption this fix addresses.
In more detail:

typedef struct blkmap {
        int             naexts;
        int             nexts;
        bmap_ext_t      exts[1];
} blkmap_t;

#define BLKMAP_SIZE(n)  \
        (offsetof(blkmap_t, exts) + (sizeof(bmap_ext_t) * (n)))

sizeof()/offsetof() on 32 bit platforms return a 32 bit number,
naexts is a 32 bit number, so BLKMAP_SIZE() will overflow if
blkmap->naexts >= INT_MAX / sizeof(bmap_ext_t).

IOWs, we can't use BLKMAP_SIZE to detect an overflow, because it
is the code that overflows...

> And since this would expose the internals of what
> BLKMAP_SIZE() does, it might be nicer if some sort of
> BLKMAP_NENTS_MAX constant were defined next to the
> definition of BLKMAP_SIZE().

I can add a BLKMAP_NENTS_MAX constant to repair/bmap.h.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs