On Thu, May 09, 2024 at 04:16:34PM +0100, Luis Henriques wrote: > > It's looks like it's easy to trigger an infinite loop here using fstest > generic/039. If I understand it correctly (which doesn't happen as often > as I'd like), this is due to an integer overflow in the 'if' condition, > and should be fixed with the patch below. Thanks for the report. However, I can't reproduce the failure, and looking at generic/039, I don't see how it could be relevant to the code path in question. Generic/039 creates a test symlink with two hard links in the same directory, syncs the file system, and then removes one of the hard links, and then drops access to the block device using dmflakey. So I don't see how the extent code would be involved at all. Are you sure that you have the correct test listed? Looking at the code in question in fs/ext4/extents.c: again: ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start, hole_start + len - 1, &es); if (!es.es_len) goto insert_hole; * There's a delalloc extent in the hole, handle it if the delalloc * extent is in front of, behind and straddle the queried range. */ - if (lblk >= es.es_lblk + es.es_len) { + if (lblk >= ((__u64) es.es_lblk) + es.es_len) { /* * The delalloc extent is in front of the queried range, * find again from the queried start block. len -= lblk - hole_start; hole_start = lblk; goto again; lblk and es.es_lblk are both __u32. So the infinite loop is presumably because es.es_lblk + es.es_len has overflowed. This should never happen(tm), and in fact we have a test for this case which *should* have gotten tripped when ext4_es_find_extent_range() calls __es_tree_search() in fs/ext4/extents_status.c: static inline ext4_lblk_t ext4_es_end(struct extent_status *es) { BUG_ON(es->es_lblk + es->es_len < es->es_lblk); return es->es_lblk + es->es_len - 1; } So the patch is harmless, and I can see how it might fix what you were seeing --- but I'm a bit nervous that I can't reproduce it and the commit description claims that it reproduces easily; and we should have never allowed the entry to have gotten introduced into the extents status tree in the first place, and if it had been introduced, it should have been caught before it was returned by ext4_es_find_extent_range(). Can you give more details about the reproducer; can you double check the test id, and how easily you can trigger the failure, and what is the hardware you used to run the test? Many thanks, - Ted