On 11/19/2013 03:08 PM, Anuradha Talur wrote:
Hi all, This mail is regarding quota logging for the following conditions : 1. Soft-limit alerts - "Usage crossed soft-limit" and "usage above soft-limit" 2. Quota exceeded warning Issue with the soft-limit alerts logging -- Currently, the soft-limit alerts are logged in the brick logs when a write is made and soft-limit is exceeded. In a scenario where there are a large number of bricks, it is difficult for the system admin to poll all the bricks for a soft-limit alert.
If polling is the mechanism for a soft-limit alert, why not have it as part of a command? The quota list CLI or an alternate new command can provide the list of directories on which soft-limit has been breached. This CLI can be polled to determine which of the directories exceeds soft-limit.
Issue with the quota exceeded logging -- When the quota exceeds, op_errno is set to EDQUOT and eventually "Quota exceeded" logs are being logged in the client logs (nfs.log, fuse mnt log,..) as of now (Similar to other op_errors logging). An issue has been raised stating that the quota hard-limit and soft-limit exceeding logs should be logged in same file.
Why is logging being preferred over CLI? If logging has to be the preferred mechanism for alerting, can we not use LOG_ALERT across all log files?
Logs being seen in nfs.log and fuse mount log are useful in determining what errors are being returned to applications. In addition logging in fuse and bricks can potentially be used by different consumers. The fuse log files would be useful for an administrator who manages compute and the brick log files are useful for an administrator who manages storage.
So a need for a unified logging has been proposed. We came up with the solution that all the soft-limit alert logs and the quota exceeded logs should be logged in quotad.log file.
Why cannot these logs be in the brick log files? Since both quota and marker xlators are part of the brick stack, logging in brick log files seems to be a better choice.
Resulting in the limit exceeded logs being logged in all the nodes in the cluster. The sys admin can poll any one node to get the status of soft-limit and hard-limit.
Wouldn't we be flooding all nodes with similar information? With a centralized logging infrastructure, we might end up with a lot of repeated/redundant logs.
Approach proposed to achieve this solution -- During the event of soft-limit/hard-limit exceeding, glusterd is informed. glusterd of the current node should interact with the glusterd of the other nodes in the cluster indicating the issue. All the glusterd's talk to their respective quotad to log the limit exceed in their logs. The following bugs are related to the points made here -- https://bugzilla.redhat.com/show_bug.cgi?id=1020816 https://bugzilla.redhat.com/show_bug.cgi?id=1019302 Is this approach recommended?
I don't think this is a very clean approach. Having a CLI and/or providing the right log messages in brick log files at level LOG_ALERT seems to be good enough for alerting.
-Vijay