Re: [Patch 1/4] Support for checking and reading block grade information in kernel

Sayan Ghosh <sgdgp.2014@xxxxxxxxx> · Sun, 29 Apr 2018 15:22:34 +0530

Hello Jan,

Thank you for looking into our patchset and providing feedbacks.
We are currently modifying these patches for the latest version of kernel.

The overall objective is described as follows :
The goal of our project is broadly to support data gradation of a
single file. If the contents of the file is graded in terms of its
importance then a corresponding application might need to view/analyse
only the important portions. It also helps if the important portions
can be accessed quickly without having to go through the entire file.
For an example, we can think of a leaning video with
indexing/annotations, in which the annotations contain the important
parts of the video. A learner can just be interested in those parts,
and it will help him if he can be provided with a reduced view with
just the parts he’s interested in. An example of such videos is ACM
Webinar videos where an user can navigate using table-of-contents or
phrase cloud.

The below link is one similar video -
https://videoken.com/video-detail?videoID=IpGxLWOIZy4&videoDuration=1853&videoName=A%20Friendly%20Introduction%20to%20Machine%20Learning&keyword=A%20Friendly%20Introduction%20to%20Machine%20Learning

These kind of video file can serve as an input to our system where we
know which parts of the file has been marked. Our goal then is to
properly place respective important blocks and provide a reduced view
of just the important parts of the file. Placing the important blocks
in a faster tier (SSD,PM etc) greatly enhances the performance of
reading and writing of the file.
In order to achieve this we have a data structure for the grades -
sort of like extent structure. It contains details of segments of high
graded parts of the file. The contents of the data structure are the
starting block number and the length of the segment.
So the patches basically focus on having functions to set and get the
grade information from the extended attributes and allocating the
blocks using this grade information (by modifying the fallocate calls
in the kernel). The aspect of getting a reduced view of the file is
being handled by modifying the code for dax calls in kernel.
Also taking clue from Andreas' feedback we are looking into the
streamID interface to see if we can use this for our work.
We are also looking if there are any other in-built methods which can
help in having the grade structure without introducing new data
structures. We would be grateful if you also could provide suggestions
on other ways of implementing grades.

Regards,
Sayan

On Thu, Apr 19, 2018 at 9:10 PM, Jan Kara <jack@xxxxxxx> wrote:
> On Fri 06-04-18 17:11:40, Sayan Ghosh wrote:
>> This introduces the different functions in order to get the grades as
>> the extended attributes while pre-allocating a new file. The grades
>> are stored as extended attributes while the file gets created. The
>> grades can be used by different user space applications as necessary.
>> The functions introduced are read_grade_xattr(), is_file_graded(),
>> read_count_xattr() which aim to read the extended attribute for grade
>> array and also to know whether the file is graded. The detailed
>> descriptions of the functions are provided as comments in the patch.
>> The patch is on top of Linux Kernel 4.7.2.
>>
>> Signed-off-by: Sayan Ghosh <sgdgp.2014@xxxxxxxxx>
>
> Thanks for the patch! The fact that this is based on rather old kernel has
> been already mentioned - you really need to base on much newer kernel to
> get this merged. Another problem I see is that there's no description of
> the design of this feature. I.e., What this feature is good for? And how is
> it supposed to work? Probably before investing too much time into rebasing
> you can start with sending the high level design of the feature for
> discussion. From quickly glancing through the patches I gather it is some
> kind of HSM but I'm not completely sure...
>
>                                                                 Honza
>
>> ---
>>  fs/ext4/ext4.h    | 15 +++++++++++++++
>>  fs/ext4/extents.c | 35 +++++++++++++++++++++++++++++++++++
>>  2 files changed, 50 insertions(+)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index b84aa1c..b9ec0ca 100755
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -136,6 +136,18 @@ enum SHIFT_DIRECTION {
>>  /* Use blocks from reserved pool */
>>  #define EXT4_MB_USE_RESERVED        0x2000
>>
>> +/* Structure of a grade - starting block number
>> + * and length of contiguous blocks with same higher
>> + * grade (inclusive of starting block)
>> + * example : if blocks 2,3,4 are higher graded,
>> + * then block_num = 2 and len = 3
>> + * Only high grade information is stored by this struct.
>> + */
>> +struct grade_struct {
>> +    ext4_lblk_t block_num;
>> +    unsigned long long len;
>> +};
>> +
>>  struct ext4_allocation_request {
>>      /* target inode for block we're allocating */
>>      struct inode *inode;
>> @@ -3186,6 +3198,9 @@ extern int ext4_check_blockref(const char *, unsigned int,
>>  /* extents.c */
>>  struct ext4_ext_path;
>>  struct ext4_extent;
>> +extern unsigned long long read_count_xattr(struct inode *inode);
>> +extern void read_grade_xattr(struct inode *inode,struct grade_struct
>> *grade_array);
>> +extern int is_file_graded(struct inode *inode);
>>
>>  /*
>>   * Maximum number of logical blocks in a file; ext4_extent's ee_block is
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index d7ccb7f..de9194f 100755
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -57,6 +57,41 @@
>>  #define EXT4_EXT_DATA_VALID1    0x8  /* first half contains valid data */
>>  #define EXT4_EXT_DATA_VALID2    0x10 /* second half contains valid data */
>>
>> +/*
>> + * read_grade_xattr() is used to read the grade array from the
>> extended attribute.
>> + */
>> +void read_grade_xattr(struct inode *inode,struct grade_struct *grade_array)
>> +{
>> +    const char *xattr_name = "grade_array";
>> +    int xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
>> +    xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, (void *)grade_array,xattr_size);
>> +    return;
>> +}
>> +
>> +/*
>> + * is_file_graded() returns whether the file has a grade information or not.
>> + * It takes the inode number as a parameter.
>> + */
>> +int is_file_graded(struct inode *inode)
>> +{
>> +    const char *xattr_name = "is_graded";
>> +    int is_graded = 0;
>> +    int xattr_size = sizeof(int);
>> +    xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, (void *)&amp;is_graded,xattr_size);
>> +    return is_graded;
>> +}
>> +
>> +/*
>> + * read_count_xattr() used to get the number of the elements in the
>> grade array.
>> + */
>> +unsigned long long read_count_xattr(struct inode *inode)
>> +{
>> +    const char *xattr_name = "grade_array";
>> +    unsigned long long xattr_size = ext4_xattr_get(inode,
>> EXT4_XATTR_INDEX_USER,xattr_name, NULL,0);
>> +    unsigned long long total = xattr_size/sizeof(struct grade_struct);
>> +    return total;
>> +}
>> +
>>  static __le32 ext4_extent_block_csum(struct inode *inode,
>>                       struct ext4_extent_header *eh)
>>  {
>> ‌
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR

</jack@xxxxxxxx></sgdgp.2014@xxxxxxxxx></jack@xxxxxxx>