Re: [PATCH v7 01/20] trace-cmd library: Add support for compression algorithms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 24, 2022 at 10:03 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Mon, 24 Jan 2022 06:54:48 +0200
> Tzvetomir Stoyanov <tz.stoyanov@xxxxxxxxx> wrote:
>
> > > Can this handle sections that are more than 4G in size?
> > >
> > > That is definitely something we need to be able to handle. Or if a
> > > trace-cmd data file has a section greater than 4G, it is broken up into
> > > smaller chunks for compression?
> >
> > Yes, large sections are compressed and uncompressed in chunks. That
> > limits the size of one chunk, not the section size. In the proposed
> > implementation, one chunk of compressed trace data is 10 trace pages.
>
> OK.
>
>
> > > > +
> > > > +/**
> > > > + * tracecmd_uncompress_block - uncompress a memory block
> > > > + * @handle: compression handle
> > > > + *
> > > > + * Read compressed memory block from the file and uncompress it into internal buffer.
> > > > + * The tracecmd_compress_read() can be used to read the uncompressed data from the buffer
> > >
> > > BTW, even though we allow 100 character width, it's best to keep
> > > comments under 80, when possible.
> > >
> > > Also, I found the naming of tracecmd_compress_read/write() confusing as
> > > it reads and writes to the buffer and not the file. It may not have
> > > been so bad if we didn't have these functions where we have
> > > compress_block and uncompress_block that reads and writes to the file.
> > >
> > > Prehaps s/tracecmd_compress_read/tracecmd_compress_retrieve/
> > >         s/tracecmd_compress_write/tracecmd_compress_insert/
> > >
> > > ?
> > >
> >
> > I'm OK to rename them, though the current names sound logical to me.
> > In the trace-cmd code, I just replaced read() with
> > tracecmd_compress_read(), that's why decided to choose that name.
>
> The issue I had was that tracecmd_read reads the file, whereas to me
> "tracecmd_compress_read()" sounds like it would read the file that was
> compressed, and not an intermediate buffer.
>
> My issue is that the name does not reflect that it is reading from this
> intermediate.
>
> Another name could be:
>
>  tracecmd_compress_buffer_read()
>  tracecmd_compress_buffer_write()
>

I like the names with "_buffer_read", will rename them in the next version.

> Perhaps that would be better. That lets you know that its reading the
> compressed buffer, as the "buffer" in the name suggests.
>
>
> > >
> > > > +     free(bytes);
> > > > +     handle->pointer = 0;
> > > > +     handle->capacity = s_uncompressed;
> > >
> > > If size is the greater of the two (compressed could be larger) and
> > > capacity is the allocated buffer, shouldn't we make it equal to size
> > > and not uncompressed?
> >
> > The name "capacity" in reading logic is a bit confusing, as it
> > actually is used to track the size of uncompressed data. The buffer
> > may be larger, but if the uncompressed is smaller - the rest is
> > garbage and must not be read.
>
> Then the extend function is broken:

For this particular use case, the logic is not correct - the realloc()
could be redundant, if the real buffer size is bigger than the
capacity. But as usually a compression handle is used for reading or
for writing only, this is unlikely to happen. At least I cannot
imagine a use case where the same compression handle will be used for
reading a compress block and for writing compressed data. But even if
we hit that case - there is no data corruption or memory leak, just a
redundant realloc(). To solve that, reading and writing capacities
could be introduced ?
    struct tracecmd_compression {
       ...
       unsigned int            capacity_read;
       unsigned int            capacity_write;
       ...
   }

But that change will be for a very rare case - when the compressed
data size is bigger than the uncompressed (usually happens only with a
very small data) + the same compression handle is used for compression
and decompression (not used in that way in trace-cmd).

>
> static inline int buffer_extend(struct tracecmd_compression *handle, unsigned int size)
> {
>         int extend;
>         char *buf;
>
>         if (size <= handle->capacity)
>                 return 0;
>
>         extend = ((size - handle->capacity) / BUFSIZ + 1) * BUFSIZ;
>         buf = realloc(handle->buffer, handle->capacity + extend);
>         if (!buf)
>                 return -1;
>         handle->buffer = buf;
>         handle->capacity += extend;
>
>         return 0;
> }
>
> As it sets the capacity to the allocated size, not what is stored.
>
> Perhaps you meant:
>
>         handle->capacity += size;
>
> ?
>
> >
> > >
> > >
> > > > +     return 0;
> > > > +error:
> > > > +     tracecmd_compress_reset(handle);
> > > > +     free(bytes);
> > > > +     return -1;
> > > > +}
> > > > +
> > > > +/**
> > > > + * tracecmd_compress_block - compress a memory block
> > > > + * @handle: compression handle
> > > > + *
> > > > + * Compress the content of the internal memory buffer and write the compressed data in the file
> > > > + * The tracecmd_compress_write() can be used to write data into the internal memory buffer, before
> > > > + * calling this API.
> > > > + *
> > > > + * Returns 0 on success, or -1 in case of an error.
> > > > + */
> > > > +int tracecmd_compress_block(struct tracecmd_compression *handle)
> > > > +{
> > > > +     unsigned int size;
> > > > +     char *buf;
> > > > +     int endian4;
> > > > +     int ret;
> > > > +
> > > > +     if (!handle || !handle->proto ||
> > > > +         !handle->proto->compress_size || !handle->proto->compress_block)
> > > > +             return -1;
> > > > +
> > > > +     size = handle->proto->compress_size(handle->pointer);
> > >
> > > space
> > >
> > > > +     buf = malloc(size);
> > > > +     if (!buf)
> > > > +             return -1;
> > >
> > > space
> > >
> > > > +     ret = handle->proto->compress_block(handle->buffer, handle->pointer, buf, &size);
> > > > +     if (ret < 0)
> > > > +             goto out;
> > >
> > > Do we want to free the compress handle on error?
> >
> > I think returning an error is enough, the caller should decide if to
> > free the compress handle in that case. The handle is not allocated
> > here.
> >
>
> Heh, I actually meant: Do we want to reset the handle here?
>
> That is, I'm not asking to do more, but if we should do less.
>

I see what you mean, it makes sense to keep the data in case of
compression failure. Let the caller decide if to reset the handle.


> > >
> > > space
> > >
> > > > +     /* Write compressed data size */
> > > > +     endian4 = tep_read_number(handle->tep, &size, 4);
> > > > +     ret = do_write(handle, &endian4, 4);
> > > > +     if (ret != 4)
> > > > +             goto out;
> > >
> > > space
> > >
> > > > +     /* Write uncompressed data size */
> > > > +     endian4 = tep_read_number(handle->tep, &handle->pointer, 4);
> > > > +     ret = do_write(handle, &endian4, 4);
> > > > +     if (ret != 4)
> > > > +             goto out;
> > >
> > > space
> > >
> > > > +     /* Write compressed data */
> > > > +     ret = do_write(handle, buf, size);
> > > > +     ret = ((ret == size) ? 0 : -1);
> > > > +out:
> > > > +     tracecmd_compress_reset(handle);
>
> I meant, do we want to do the above for all error cases?
>
> > > > +     free(buf);
> > > > +     return ret;
> > > > +}
>
> ...
>
> > > > +/**
> > > > + * tracecmd_compress_copy_from - Copy and compress data from a file
> > > > + * @handle: compression handle
> > > > + * @fd: file descriptor to uncompressed data to copy from
> > > > + * @chunk_size: size of one compression chunk
> > > > + * @read_size: in - max bytes to read from @fd, 0 to read till the EOF
> > > > + *          out - size of the uncompressed data read from @fd
> > >
> > > Is this an array of two?
> >
> > No, it is a pointer to single long long, used as input and output parameter.
>
> I'm confused by the comment then. What is does the above "in" and "out"
> mean?
>
> >
> > >
> > > > + * @write_size: return, size of the compressed data written into @handle
> > > > + *
> > > > + * This function reads uncompressed data from given @fd, compresses the data using the @handle
> > > > + * compression context and writes the compressed data into the fd associated with the @handle.
> > > > + * The data is compressed on chunks with given @chunk_size size.
> > > > + * The compressed data is written in the format:
> > > > + *  - 4 bytes, chunks count
> > > > + *  - for each chunk:
> > > > + *    - 4 bytes, size of compressed data in this chunk
> > > > + *    - 4 bytes, uncompressed size of the data in this chunk
> > > > + *    - data, bytes of <size of compressed data in this chunk>
> > >
> > > I guess this answers the question from the beginning about the use of
> > > int in the structure.
> > >
> > > > + *
> > > > + * On success 0 is returned, @read_size and @write_size are updated with the size of
> > > > + * read and written data.
> > > > + */
> > > > +int tracecmd_compress_copy_from(struct tracecmd_compression *handle, int fd, int chunk_size,
> > > > +                             unsigned long long *read_size, unsigned long long *write_size)
> > > > +{
> > > > +     unsigned int rchunk = 0;
> > > > +     unsigned int chunks = 0;
> > > > +     unsigned int wsize = 0;
> > > > +     unsigned int rsize = 0;
> > > > +     unsigned int rmax = 0;
> > > > +     unsigned int csize;
> > > > +     unsigned int size;
> > > > +     unsigned int all;
> > > > +     unsigned int r;
> > > > +     off64_t offset;
> > > > +     char *buf_from;
> > > > +     char *buf_to;
> > > > +     int endian4;
> > > > +     int ret;
> > > > +
> > > > +     if (!handle || !handle->proto ||
> > > > +         !handle->proto->compress_block || !handle->proto->compress_size)
> > > > +             return 0;
> > >
> > > space
> > >
> > > > +     if (read_size)
> > > > +             rmax = *read_size;
> > >
> > > space
> > >
> > > > +     csize = handle->proto->compress_size(chunk_size);
> > >
> > > space
> > >
> > > > +     buf_from = malloc(chunk_size);
> > > > +     if (!buf_from)
> > > > +             return -1;
> > >
> > > space
> > >
> > > > +     buf_to = malloc(csize);
> > > > +     if (!buf_to)
> > > > +             return -1;
> > >
> > > space
> > >
> > > > +     /* save the initial offset and write 0 chunks */
> > >
> > > Why zero? The above comment is more like " /* set x to zero */ x = 0; "
> >
> > Maybe "write 0 as initial chunk count" is a better comment. The error
> > cases below rely on that "0". Count of 0 should be written in the
> > file, in case of some error, or in case there is no data to compress.
>
> Yeah, please add more to that comment.
>
> Thanks,
>
> -- Steve
>
> >
> > >
> > > > +     offset = lseek64(handle->fd, 0, SEEK_CUR);
> > > > +     write_fd(handle->fd, &chunks, 4);
>


--
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center



[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux