Re: [PATCH 1/3] libnsm.a: sm-notify sometimes ignores monitored hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Dec 13, 2010, at 11:54 AM, Steve Dickson wrote:

> Hey Chuck,
> 
> On 12/06/2010 11:09 AM, Chuck Lever wrote:
>> Monitored host information is stored in files under /var/lib/nfs.
>> When visiting entries in the monitored hosts directory, libnsm.a
>> examines the value of dirent.d_type to determine if an entry is a
>> regular file.
>> 
>> According to readdir(3), the d_type field is not supported by all
>> file system types.  My root file system happens to be one where d_type
>> isn't supported.  Typical installations that use an ext-derived root
>> file system are not exposed to this issue, but those who use xfs, for
>> instance, are.
>> 
>> On such file systems, not only are remote peers not notified of
>> reboots, but the NSM state number is never incremented.  A statd warm
>> restart would not re-monitor any hosts that were monitored before
>> the restart.
>> 
>> When writing support/nsm/file.c, I copied the use of d_type from the
>> original statd code, so this has likely been an issue for some time.
>> 
>> Replace the use of d_type in support/nsm/file.c with a call to
>> lstat(2).  It's extra code, but is guaranteed to work on all file
>> system types.
>> 
>> Note there is a usage of d_type in gssd.  I'll let gssd and rpcpipefs
>> experts decide whether that's worth changing.
>> 
>> Fix for:
>> 
>>  https://bugzilla.linux-nfs.org/show_bug.cgi?id=193
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> ---
>> 
>> support/nsm/file.c |   38 ++++++++++++++++++++++++++++++--------
>> 1 files changed, 30 insertions(+), 8 deletions(-)
>> 
>> diff --git a/support/nsm/file.c b/support/nsm/file.c
>> index f4baeb9..ba2a9ce 100644
>> --- a/support/nsm/file.c
>> +++ b/support/nsm/file.c
>> @@ -568,11 +568,13 @@ nsm_retire_monitored_hosts(void)
>> 
>> 	while ((de = readdir(dir)) != NULL) {
>> 		char *src, *dst;
>> +		struct stat stb;
>> 
>> -		if (de->d_type != (unsigned char)DT_REG)
>> -			continue;
>> -		if (de->d_name[0] == '.')
>> +		if (de->d_name[0] == '.') {
>> +			xlog(D_GENERAL, "Skipping dot file %s",
>> +					de->d_name);
>> 			continue;
>> +		}
>> 
>> 		src = nsm_make_record_pathname(NSM_MONITOR_DIR, de->d_name);
>> 		if (src == NULL) {
>> @@ -580,6 +582,20 @@ nsm_retire_monitored_hosts(void)
>> 			continue;
>> 		}
>> 
>> +		/* NB: not all file systems fill in d_type correctly */
>> +		if (lstat(src, &stb) == -1) {
>> +			xlog_warn("Bad monitor file %s, skipping: %m",
>> +					de->d_name);
>> +			free(src);
>> +			continue;
>> +		}
>> +		if (!S_ISREG(stb.st_mode)) {
>> +			xlog(D_GENERAL, "Skipping non-regular file %s",
>> +					de->d_name);
>> +			free(src);
>> +			continue;
>> +		}
>> +
>> 		dst = nsm_make_record_pathname(NSM_NOTIFY_DIR, de->d_name);
>> 		if (dst == NULL) {
>> 			free(src);
>> @@ -846,7 +862,7 @@ nsm_read_line(const char *hostname, const time_t timestamp, char *line,
>> }
>> 
>> /*
>> - * Given a filename, reads data from a file under NSM_MONITOR_DIR
>> + * Given a filename, reads data from a file under "directory"
>>  * and invokes @func so caller can populate their in-core
>>  * database with this data.
>>  */
>> @@ -863,10 +879,15 @@ nsm_load_host(const char *directory, const char *filename, nsm_populate_t func)
>> 	if (path == NULL)
>> 		goto out_err;
>> 
>> -	if (stat(path, &stb) == -1) {
>> +	if (lstat(path, &stb) == -1) {
>> 		xlog(L_ERROR, "Failed to stat %s: %m", path);
>> 		goto out_freepath;
>> 	}
>> +	if (!S_ISREG(stb.st_mode)) {
>> +		xlog(D_GENERAL, "Skipping non-regular file %s",
>> +				path);
> Question, why do we care non-regular files are being ignored?

We probably want to report anything unexpected in the /var/lib/nfs/statd/sm{,.bak} directories.

> I understand logging the lstat() error but logging statements like
> "ignoring this" or "not doing that" just make the debug output a 
> bit too noisy IMHO... 

My expectation is that under normal circumstances this message would never fire.  statd shouldn't put anything in that directory that isn't a regular file.  If there's something else in there, we should report it.  Also, if statd (or something else) is broken and the code thinks the object isn't a regular file, then this message would point to what is wrong.

> I'm thinking it would be better to log the files that 
> are actually being used verses files that are being ignored.

I'd have to check, but I think the hostnames that are "used" are actually reported by the caller.  If callers don't report the hostnames, then we can add something here easily enough.

>> +		goto out_freepath;
>> +	}
>> 
>> 	f = fopen(path, "r");
>> 	if (f == NULL) {
>> @@ -913,10 +934,11 @@ nsm_load_dir(const char *directory, nsm_populate_t func)
>> 	}
>> 
>> 	while ((de = readdir(dir)) != NULL) {
>> -		if (de->d_type != (unsigned char)DT_REG)
>> -			continue;
>> -		if (de->d_name[0] == '.')
>> +		if (de->d_name[0] == '.') {
>> +			xlog(D_GENERAL, "Skipping dot file %s",
>> +					de->d_name);
> The same goes with this... Do we really care "." and ".." are
> being ignored? Isn't that expected? ;-)

It's kind of a heartbeat message, if you will.  But this one is optional and can be removed.

Once we agree on what messages should appear here, I'll redrive the patch.

> 
> steved.
> 
>> 			continue;
>> +		}
>> 
>> 		count += nsm_load_host(directory, de->d_name, func);
>> 	}
>> 

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux