> On Jul 15, 2024, at 4:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Tue, Jul 09, 2024 at 12:10:21PM -0700, Wengang Wang wrote: >> segments are the smallest unit to defragment. >> >> A segment >> 1. Can't exceed size limit >> 2. contains some extents >> 3. the contained extents can't be "unwritten" >> 4. the contained extents must be contigous in file blocks >> >> Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx> >> --- >> spaceman/defrag.c | 204 ++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 204 insertions(+) >> >> diff --git a/spaceman/defrag.c b/spaceman/defrag.c >> index c9732984..175cf461 100644 >> --- a/spaceman/defrag.c >> +++ b/spaceman/defrag.c >> @@ -14,6 +14,32 @@ >> #include "space.h" >> #include "input.h" >> >> +#define MAPSIZE 512 >> +/* used to fetch bmap */ >> +struct getbmapx g_mapx[MAPSIZE]; >> +/* current offset of the file in units of 512 bytes, used to fetch bmap */ >> +static long long g_offset = 0; >> +/* index to indentify next extent, used to get next extent */ >> +static int g_ext_next_idx = -1; > > Please do not prefix global variables with "g_". This is not > useful, and simply makes the code hard to read. > > That said, it is much better to pass these as function parameters so > they are specific to the mapping context and so are inherently > thread safe. > OK, will try to move them to function parameters, But I do see global variables used in xfsprog. >> +/* >> + * segment, the smallest unit to defrag >> + * it includes some contiguous extents. >> + * no holes included, >> + * no unwritten extents included >> + * the size is limited by g_segment_size_lmt >> + */ > > I have no idea what this comment is trying to tell me. OK. > >> +struct defrag_segment { >> + /* segment offset in units of 512 bytes */ >> + long long ds_offset; >> + /* length of segment in units of 512 bytes */ >> + long long ds_length; >> + /* number of extents in this segment */ >> + int ds_nr; >> + /* flag indicating if segment contains shared blocks */ >> + bool ds_shared; >> +}; >> + >> /* defrag segment size limit in units of 512 bytes */ >> #define MIN_SEGMENT_SIZE_LIMIT 8192 /* 4MiB */ >> #define DEFAULT_SEGMENT_SIZE_LIMIT 32768 /* 16MiB */ >> @@ -78,6 +104,165 @@ defrag_check_file(char *path) >> return true; >> } >> >> +/* >> + * get next extent in the file. >> + * Note: next call will get the same extent unless move_next_extent() is called. >> + * returns: >> + * -1: error happened. >> + * 0: extent returned >> + * 1: no more extent left >> + */ >> +static int >> +defrag_get_next_extent(int fd, struct getbmapx *map_out) >> +{ >> + int err = 0, i; >> + >> + /* when no extents are cached in g_mapx, fetch from kernel */ >> + if (g_ext_next_idx == -1) { >> + g_mapx[0].bmv_offset = g_offset; >> + g_mapx[0].bmv_length = -1LL; >> + g_mapx[0].bmv_count = MAPSIZE; >> + g_mapx[0].bmv_iflags = BMV_IF_NO_HOLES | BMV_IF_PREALLOC; >> + err = ioctl(fd, XFS_IOC_GETBMAPX, g_mapx); >> + if (err == -1) { >> + perror("XFS_IOC_GETBMAPX failed"); >> + goto out; >> + } >> + /* for stats */ >> + g_ext_stats.nr_ext_total += g_mapx[0].bmv_entries; >> + >> + /* no more extents */ >> + if (g_mapx[0].bmv_entries == 0) { >> + err = 1; >> + goto out; >> + } >> + >> + /* for stats */ >> + for (i = 1; i <= g_mapx[0].bmv_entries; i++) { >> + if (g_mapx[i].bmv_oflags & BMV_OF_PREALLOC) >> + g_ext_stats.nr_ext_unwritten++; >> + if (g_mapx[i].bmv_oflags & BMV_OF_SHARED) >> + g_ext_stats.nr_ext_shared++; >> + } >> + >> + g_ext_next_idx = 1; >> + g_offset = g_mapx[g_mapx[0].bmv_entries].bmv_offset + >> + g_mapx[g_mapx[0].bmv_entries].bmv_length; >> + } >> + >> + map_out->bmv_offset = g_mapx[g_ext_next_idx].bmv_offset; >> + map_out->bmv_length = g_mapx[g_ext_next_idx].bmv_length; >> + map_out->bmv_oflags = g_mapx[g_ext_next_idx].bmv_oflags; >> +out: >> + return err; >> +} > > Ok, so the global variables are just a bmap cache. That's a problem, > because this cache is stale the moment XFS_IOC_GETBMAPX returns to > userspace. Iterating it to decide exactly waht to do next will > race with ongoing file modifications and so it's not going to be > accurate.... Yes, there is racing. And even we do defrag basing on stale extents, there is harm to the file under defrag though it might not bring good defrag result. > >> + >> +/* >> + * move to next extent >> + */ >> +static void >> +defrag_move_next_extent() >> +{ >> + if (g_ext_next_idx == g_mapx[0].bmv_entries) >> + g_ext_next_idx = -1; >> + else >> + g_ext_next_idx += 1; >> +} >> + >> +/* >> + * check if the given extent is a defrag target. >> + * no need to check for holes as we are using BMV_IF_NO_HOLES >> + */ >> +static bool >> +defrag_is_target(struct getbmapx *mapx) >> +{ >> + /* unwritten */ >> + if (mapx->bmv_oflags & BMV_OF_PREALLOC) >> + return false; >> + return mapx->bmv_length < g_segment_size_lmt; >> +} >> + >> +static bool >> +defrag_is_extent_shared(struct getbmapx *mapx) >> +{ >> + return !!(mapx->bmv_oflags & BMV_OF_SHARED); >> +} >> + >> +/* >> + * get next segment to defragment. >> + * returns: >> + * -1 error happened. >> + * 0 segment returned. >> + * 1 no more segments to return >> + */ >> +static int >> +defrag_get_next_segment(int fd, struct defrag_segment *out) >> +{ >> + struct getbmapx mapx; >> + int ret; >> + >> + out->ds_offset = 0; >> + out->ds_length = 0; >> + out->ds_nr = 0; >> + out->ds_shared = false; > > out->ds_nr is never set to anything but zero in this patch. > It’s set at line 211 in the raw patch. 206 +add_ext: 207 + if (defrag_is_extent_shared(&mapx)) 208 + out->ds_shared = true; 209 + 210 + out->ds_length += mapx.bmv_length; 211 + out->ds_nr += 1; 212 + defrag_move_next_extent(); >> + >> + do { >> + ret = defrag_get_next_extent(fd, &mapx); >> + if (ret != 0) { >> + /* >> + * no more extetns, return current segment if its not > > Typos everywhere. OK. > >> + * empty >> + */ >> + if (ret == 1 && out->ds_nr > 0) >> + ret = 0; >> + /* otherwise, error heppened, stop */ >> + break; >> + } > >> + >> + /* >> + * If the extent is not a defrag target, skip it. >> + * go to next extent if the segment is empty; >> + * otherwise return the segment. >> + */ >> + if (!defrag_is_target(&mapx)) { >> + defrag_move_next_extent(); >> + if (out->ds_nr == 0) >> + continue; >> + else >> + break; >> + } >> + >> + /* check for segment size limitation */ >> + if (out->ds_length + mapx.bmv_length > g_segment_size_lmt) >> + break; >> + >> + /* the segment is empty now, add this extent to it for sure */ >> + if (out->ds_nr == 0) { >> + out->ds_offset = mapx.bmv_offset; >> + goto add_ext; >> + } > > So this is essentially a filter for the getbmapx output that strips > away unwritten extents and anything outside/larger than the target > range. yes. > >> + >> + /* >> + * the segment is not empty, check for hole since the last exent >> + * if a hole exist before this extent, this extent can't be >> + * added to the segment. return the segment >> + */ >> + if (out->ds_offset + out->ds_length != mapx.bmv_offset) >> + break; >> + >> +add_ext: > > Why do you need a goto for this logic? > > /* > * the segment is not empty, check for hole since the last exent > * if a hole exist before this extent, this extent can't be > * added to the segment. return the segment > */ > if (out->ds_nr) { > if (out->ds_offset + out->ds_length != mapx.bmv_offset) > break; > } else { > out->ds_offset = mapx.bmv_offset; > } > Above code also work. The using of "goto add_ext” saved a “if" inside a “if” making the code clearer to me. But I can change it as you expected. >> + if (defrag_is_extent_shared(&mapx)) >> + out->ds_shared = true; >> + >> + out->ds_length += mapx.bmv_length; >> + out->ds_nr += 1; >> + defrag_move_next_extent(); >> + >> + } while (true); > >> + >> + return ret; >> +} >> + >> /* >> * defragment a file >> * return 0 if successfully done, 1 otherwise >> @@ -92,6 +277,9 @@ defrag_xfs_defrag(char *file_path) { >> struct fsxattr fsx; >> int ret = 0; >> >> + g_offset = 0; >> + g_ext_next_idx = -1; >> + >> fsx.fsx_nextents = 0; >> memset(&g_ext_stats, 0, sizeof(g_ext_stats)); >> >> @@ -119,6 +307,22 @@ defrag_xfs_defrag(char *file_path) { >> ret = 1; >> goto out; >> } >> + >> + do { >> + struct defrag_segment segment; >> + >> + ret = defrag_get_next_segment(defrag_fd, &segment); >> + /* no more segments, we are done */ >> + if (ret == 1) { >> + ret = 0; >> + break; >> + } >> + /* error happened when reading bmap, stop here */ >> + if (ret == -1) { >> + ret = 1; >> + break; >> + } > > ternary return values are nasty. Return a negative errno when a > an error occurs, and -ENODATA when there are no more segments. > Then you have > > if (ret < 0) { > if (ret == -ENODATA) > exit_value = 0; > else > exit_value = 1; > break; > } Agreed. I have modified defrag_get_next_segment() and defrag_get_next_extent() for V2, Making them return -1 for error and 0 for success. And the "no more extents/segements” is checked by looking at their length. Will send them out. > >> + } while (true); > > Not a fan of do {} while(true) loops. > > WIth the above error handling changes, this becomes: > > do { > struct defrag_segment segment; > > ret = defrag_get_next_segment(defrag_fd, &segment); > } while (ret == 0); > > if (ret == 0 || ret == -ENODATA) > exit_value = 0; > else > exit_value = 1; > Yes, I making “big” changes to defrag_get_next_segment()/defrag_get_next_extent() Please review them when I sent them out. > > Ok, so this is a linear iteration of all extents in the file that > filters extents for the specific "segment" that is going to be > processed. I still have no idea why fixed length segments are > important, but "linear extent scan for filtering" seems somewhat > expensive. Hm… fixed length segments — actually not fixed length segments, but segment size can’t exceed the limitation. So segment.ds_length <= LIMIT. Larger segment take longer time (with filed locked) to defrag. The segment size limit is a way to balance the defrag and the parallel IO latency. > > Indeed, if you used FIEMAP, you can pass a minimum > segment length to filter out all the small extents. Iterating that > extent list means all the ranges you need to defrag are in the holes > of the returned mapping information. This would be much faster > than an entire linear mapping to find all the regions with small > extents that need defrag. The second step could then be doing a > fine grained mapping of each region that we now know either contains > fragmented data or holes.... Hm… just a question here: As your way, say you set the filter length to 2048, all extents with 2048 or less blocks are to defragmented. What if the extent layout is like this: 1. 1 2. 2049 3. 2 4. 2050 5. 1 6. 2051 In above case, do you do defrag or not? As I understand the situation, performance of defrag it’s self is not a critical concern here. Thanks, Wengang