Re: [bug report] smb: client: compress: LZ77 code improvements cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan,

I just sent a patch fixing some of the issues you mentioned, but I'm
elaborating on some of them here.

On 09/13, Dan Carpenter wrote:
Hello Enzo Matsumiya,

Commit 13b68d44990d ("smb: client: compress: LZ77 code improvements
cleanup") from Sep 6, 2024 (linux-next), leads to the following
Smatch static checker warning:

	fs/smb/client/compress.c:311 should_compress()
	warn: signedness bug returning '(-12)'

fs/smb/client/compress.c
   292 bool should_compress(const struct cifs_tcon *tcon, const struct smb_rqst *rq)
   293 {
   294         const struct smb2_hdr *shdr = rq->rq_iov->iov_base;
   295
   296         if (unlikely(!tcon || !tcon->ses || !tcon->ses->server))
   297                 return false;
   298
   299         if (!tcon->ses->server->compression.enabled)
   300                 return false;
   301
   302         if (!(tcon->share_flags & SMB2_SHAREFLAG_COMPRESS_DATA))
   303                 return false;
   304
   305         if (shdr->Command == SMB2_WRITE) {
   306                 const struct smb2_write_req *wreq = rq->rq_iov->iov_base;
   307
   308                 if (wreq->Length < SMB_COMPRESS_MIN_LEN)

(wrapped "wreq->Length" in le32_to_cpu() -- caught by sparse)

   309                         return false;
   310
--> 311                 return is_compressible(&rq->rq_iter);

Was this patch ever sent to a public mailing list?  I don't see it on lore.

The is_compressible() function has some weird stuff going on.  It's an is_
function so it should return bool instead of negative error codes.  Here the
negative error codes are treated as "let's compress this".

Fixed.  Changed return type to bool and now WARN_ON_ONCE(1) on possible
internal errors.

check_repeated_data().  It's better to name boolean functions something like
is_repeated_data() so people will know it's boolean.

Fixed -- used has_repeated_data() though, see below.

This function takes the
sample data and compares if the first half is a duplicate of the second half.
The only way I can imagine this happening in the real world is if the file is
all zeroes.

You're correct.  But here we're dealing with chunks of files, and,
depending on the file type and data alignment, there could be cases
where this check is useful.

Though, statistically, according to my tests, this function never
returns true.  I left it there because I'm still diversifying my tests
payloads and it doesn't hurt performance that much, but it might get
dropped later.

check_ascii_bytes().  Rename to is_mostly_ascii().

Fixed.

 Get rid of the bucket struct
and instead just it make an array of unsigned int.  Then the checks are
simpler:

-		if (bkt[i].count > 0)
+		if (bkt[i])

Not sure that's possible -- each bucket represents a byte in the sample
and it makes calc_byte_distribution() fail faster on an uniform
distribution (because bkt is sorted in descending order).

If an unsigned int array is used, even though it works for
is_mostly_ascii(), the "which byte" information is lost after sorting,
which impacts results.

But please let me know if you did/observe something else.

The check_ascii_bytes() has some micro-optimizations going on.  I don't think
they're necessary.  I can't imagine how this could show up on a benchmark.  Just
do one loop instead of two.  On my computer I'm able to do that function 12
million times per second, but I micro optimized the function even more and now
I can go through it 20 million times per second using a single loop.

static bool check_ascii_bytes2(const unsigned int *p)
{
       int count = 0;
       int i;

       for (i = 0; i < 256; i++) {
               if (p[i]) {
                       if (++count > 64)
                               return false;
               }
       }

       return true;
}

The trick was I got rid of the threshold const used a literal instead.

Fixed.

The check at the end of is_compressible() should not treat negatives as success:

-	return !!ret;
+	if (ret == 1)
+		return true;
+	return false;

   312         }
   313
   314         return (shdr->Command == SMB2_READ);
   315 }

Fixed.

Also renamed calc_shannon_entropy -> has_low_entropy to fit these
changes.


Cheers,

Enzo




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux