Derrick Stolee <stolee@xxxxxxxxx> writes: > You are correct that if the path-walk API emitted multiple batches > with the same path name, then we would not detect that via the current > testing strategy. > > The main reason to use the sort is to avoid adding a restriction on > the order in which objects appear within the batch. > > Your recommendation to group a batch into a single line does not > strike me as a suitable approach, because long lines become hard to > read and difficult to parse diffs. (Also, the order within the batch > becomes baked in as a requirement.) The hashes in a line can be abbreviated if line length is a concern. Also, note that I am suggesting sorting the OIDs within a line (that is, a batch), and also sorting the lines (batches) as a whole. > The biggest question I'd like to ask is this: do you see a risk of > a path being repeated? There are cases where it will happen, such as > indexed objects that are not reachable anywhere else. I was thinking that the whole point of this feature is that we group objects by path, so it seems desirable to test that paths are not repeated. (Or repeated as little as possible, if it is not possible to avoid repetition e.g. in the case you describe.) > The way I would consider modifying these tests to reflect the batching > would be to associate each batch with a number, causing the order of > the paths to become hard-coded in the test. Something like > > 0:COMMIT::$(git rev-parse ...) > 0:COMMIT::$(git rev-parse ...) > 1:TREE::$(git rev-parse ...) > 1:TREE::$(git rev-parse ...) > 2:TREE:right/:$(git rev-parse ...) > 3:BLOB:right/a:$(...) > 4:TREE:left/:$(git rev-parse ...) > 5:BLOB:left/b:$(...) > > This would imply some amount of order that maybe should become a > requirement of the API. > > Thanks, > -Stolee If we're willing to declare an order in which we will return paths to the user, that would work too. (I'm not sure that we need to declare an order, though.)