On 4/1/2021 11:40 AM, Jeff Hostetler via GitGitGadget wrote: > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > Teach fsmonitor--daemon to create token-ids and define the > overall token naming scheme. ... > +/* > + * Requests to and from a FSMonitor Protocol V2 provider use an opaque > + * "token" as a virtual timestamp. Clients can request a summary of all > + * created/deleted/modified files relative to a token. In the response, > + * clients receive a new token for the next (relative) request. > + * > + * > + * Token Format > + * ============ > + * > + * The contents of the token are private and provider-specific. > + * > + * For the built-in fsmonitor--daemon, we define a token as follows: > + * > + * "builtin" ":" <token_id> ":" <sequence_nr> > + * > + * The <token_id> is an arbitrary OPAQUE string, such as a GUID, > + * UUID, or {timestamp,pid}. It is used to group all filesystem > + * events that happened while the daemon was monitoring (and in-sync > + * with the filesystem). > + * > + * Unlike FSMonitor Protocol V1, it is not defined as a timestamp > + * and does not define less-than/greater-than relationships. > + * (There are too many race conditions to rely on file system > + * event timestamps.) > + * > + * The <sequence_nr> is a simple integer incremented for each event > + * received. When a new <token_id> is created, the <sequence_nr> is > + * reset to zero. > + * > + * > + * About Token Ids > + * =============== > + * > + * A new token_id is created: > + * > + * [1] each time the daemon is started. > + * > + * [2] any time that the daemon must re-sync with the filesystem > + * (such as when the kernel drops or we miss events on a very > + * active volume). > + * > + * [3] in response to a client "flush" command (for dropped event > + * testing). > + * > + * [4] MAYBE We might want to change the token_id after very complex > + * filesystem operations are performed, such as a directory move > + * sequence that affects many files within. It might be simpler > + * to just give up and fake a re-sync (and let the client do a > + * full scan) than try to enumerate the effects of such a change. > + * > + * When a new token_id is created, the daemon is free to discard all > + * cached filesystem events associated with any previous token_ids. > + * Events associated with a non-current token_id will never be sent > + * to a client. A token_id change implicitly means that the daemon > + * has gap in its event history. > + * > + * Therefore, clients that present a token with a stale (non-current) > + * token_id will always be given a trivial response. >From this comment, it seems to be the case that concurrent Git commands will race to advance the FS Monitor token and one of them will lose, causing a full working directory scan. There is no list of "recent" tokens. I could see this changing in the future, but for now it is a reasonable simplification. > + */ > +struct fsmonitor_token_data { > + struct strbuf token_id; > + struct fsmonitor_batch *batch_head; > + struct fsmonitor_batch *batch_tail; > + uint64_t client_ref_count; > +}; > + > +static struct fsmonitor_token_data *fsmonitor_new_token_data(void) > +{ > + static int test_env_value = -1; > + static uint64_t flush_count = 0; > + struct fsmonitor_token_data *token; > + > + token = (struct fsmonitor_token_data *)xcalloc(1, sizeof(*token)); I think the best practice here is "CALLOC_ARRAY(token, 1);" > + > + strbuf_init(&token->token_id, 0); This is likely overkill since you used calloc() above. > + token->batch_head = NULL; > + token->batch_tail = NULL; > + token->client_ref_count = 0; > + > + if (test_env_value < 0) > + test_env_value = git_env_bool("GIT_TEST_FSMONITOR_TOKEN", 0); > + > + if (!test_env_value) { > + struct timeval tv; > + struct tm tm; > + time_t secs; > + > + gettimeofday(&tv, NULL); > + secs = tv.tv_sec; > + gmtime_r(&secs, &tm); > + > + strbuf_addf(&token->token_id, > + "%"PRIu64".%d.%4d%02d%02dT%02d%02d%02d.%06ldZ", > + flush_count++, > + getpid(), > + tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, > + tm.tm_hour, tm.tm_min, tm.tm_sec, > + (long)tv.tv_usec); Between the PID, the flush count, and how deep you go in the timestamp, this seems to be specific enough. > + } else { > + strbuf_addf(&token->token_id, "test_%08x", test_env_value++); And this will be nice for testing. > + } > + > + return token; > +} > + > static ipc_server_application_cb handle_client; > > static int handle_client(void *data, const char *command, > @@ -330,7 +436,7 @@ static int fsmonitor_run_daemon(void) > > pthread_mutex_init(&state.main_lock, NULL); > state.error_code = 0; > - state.current_token_data = NULL; > + state.current_token_data = fsmonitor_new_token_data(); > state.test_client_delay_ms = 0; > > /* Prepare to (recursively) watch the <worktree-root> directory. */ > Thanks, -Stolee