Re: [bug report] smb: client: compress: LZ77 code improvements cleanup

Enzo Matsumiya <ematsumiya@xxxxxxx> · Fri, 13 Sep 2024 13:44:20 -0300

Hi Dan,

On 09/13, Dan Carpenter wrote:
Hello Enzo Matsumiya,

Commit 13b68d44990d ("smb: client: compress: LZ77 code improvements
cleanup") from Sep 6, 2024 (linux-next), leads to the following
Smatch static checker warning:

	fs/smb/client/compress.c:311 should_compress()
	warn: signedness bug returning '(-12)'

fs/smb/client/compress.c
   292 bool should_compress(const struct cifs_tcon *tcon, const struct smb_rqst *rq)
   293 {
   294         const struct smb2_hdr *shdr = rq->rq_iov->iov_base;
   295
   296         if (unlikely(!tcon || !tcon->ses || !tcon->ses->server))
   297                 return false;
   298
   299         if (!tcon->ses->server->compression.enabled)
   300                 return false;
   301
   302         if (!(tcon->share_flags & SMB2_SHAREFLAG_COMPRESS_DATA))
   303                 return false;
   304
   305         if (shdr->Command == SMB2_WRITE) {
   306                 const struct smb2_write_req *wreq = rq->rq_iov->iov_base;
   307
   308                 if (wreq->Length < SMB_COMPRESS_MIN_LEN)
   309                         return false;
   310
--> 311                 return is_compressible(&rq->rq_iter);

Was this patch ever sent to a public mailing list?  I don't see it on lore.

No, I sent them directly to Steve.
I'll make sure to send the next ones to the list.

The is_compressible() function has some weird stuff going on.  It's an is_
function so it should return bool instead of negative error codes.  Here the
negative error codes are treated as "let's compress this".

check_repeated_data().  It's better to name boolean functions something like
is_repeated_data() so people will know it's boolean.  This function takes the
sample data and compares if the first half is a duplicate of the second half.
The only way I can imagine this happening in the real world is if the file is
all zeroes.

check_ascii_bytes().  Rename to is_mostly_ascii().  Get rid of the bucket struct
and instead just it make an array of unsigned int.  Then the checks are
simpler:

-		if (bkt[i].count > 0)
+		if (bkt[i])

The check_ascii_bytes() has some micro-optimizations going on.  I don't think
they're necessary.  I can't imagine how this could show up on a benchmark.  Just
do one loop instead of two.  On my computer I'm able to do that function 12
million times per second, but I micro optimized the function even more and now
I can go through it 20 million times per second using a single loop.

static bool check_ascii_bytes2(const unsigned int *p)
{
       int count = 0;
       int i;

       for (i = 0; i < 256; i++) {
               if (p[i]) {
                       if (++count > 64)
                               return false;
               }
       }

       return true;
}

The trick was I got rid of the threshold const used a literal instead.

The check at the end of is_compressible() should not treat negatives as success:

-	return !!ret;
+	if (ret == 1)
+		return true;
+	return false;

   312         }
   313
   314         return (shdr->Command == SMB2_READ);
   315 }

I'll address those issues.
Thanks for the suggestions/corrections!

Cheers,

Enzo