Hi!
Probably I sent my patch proposals to the wrong list, so I'll resent them (see attachments). I also added corresponding new patches (0002-0003) for resetting the ringid file. I think the program should not mess with umask, specifically not inside an "if" in a subroutine. I failed to get the sense of it, so I added a comment saying so.
Regards,
Ulrich
--- Begin Message ---
- Subject: Antw: [ClusterLabs] RRP works "in totally different way then most of people expects"
- From: "Ulrich Windl" <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 08 Jun 2015 12:57:21 +0200
Answering my own questions:
Those ringid files are created by corosync_ring_id_store() from main.c, and the contents is the ring sequence number (binary). The routine is called (indrirectly) from memb_state_commit_enter() (totemsrp.c). That routine is called by memb_join_process() and message_handler_memb_commit_token().
An obvious question arises:
As the sequence number is only stored for the active ring, what happens if rings are switched (rrp_mode == passive)? Likewise it seems corosync is using the IP of the first available ring as representative on startup, it seems. Shouldn't the representative (and identifier) be stable (as per slide 4 of [AMMA00] Amir, Y.; Moser, L. E.; Melliar-Smith, P. M.; Agarwal, D. A.; Ciarfella, P.: The Totem Single-Ring Ordering and Membership Protocol)? I doubt that representative and ringid are persistent just for each ring individually...
The "rxw"-Permissions are the natural consequence of this code (corosync_ring_id_store()):
fd = open (filename, O_WRONLY, 0777);
if (fd == -1) {
fd = open (filename, O_CREAT|O_RDWR, 0777);
}
So here's my first patch proposal also.
Regards,
Ulrich
>>> "Ulrich Windl" <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx> schrieb am 02.06.2015 um
16:07 in Nachricht <556DD4B1020000A10001AA01@xxxxxxxxxxxxxxxxxxxxxxxxx>:
>>>> Jan Friesse <jfriesse@xxxxxxxxxx> schrieb am 02.06.2015 um 12:57 in Nachricht
> <556D8C08.5060001@xxxxxxxxxx>:
> [...]
>> Last but not least is RRP. RRP itself works very well sadly it works in
>> totally different way then most of people expects.
> [...]
>
> Probably because the documentation leaves to many questions unanswered...
> You can't tell what's wrong as long as you don't know how it should work.
>
> Example: What's the purpose of /var/lib/corosync/ringid_* files, and why do
> some hosts have /var/lib/corosync/ringid_127.0.0.1? The file content seems to
> be some binary number, but the file is created with rwx permissions...
>
> On systems with two rings (not on 127.*) I see all combinations from one to
> three ringid files.
>
> Regards,
> Ulrich
>
>
>
> _______________________________________________
> Users mailing list: Users@xxxxxxxxxxxxxxx
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>From 70fba1e1a547d94e0480aa24b92810a410e36a08 Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Date: Mon, 8 Jun 2015 12:52:04 +0200
Subject: [PATCH] exec/main.c: Improve corosync_ring_id_store()
exec/main.c: Change file creation mode from 0777 to S_IRUSR|S_IWUSR
improving error messages in corosync_ring_id_store().
---
exec/main.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/exec/main.c b/exec/main.c
index 2e4fd02..6c9e1b6 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -1043,15 +1043,14 @@ static void corosync_ring_id_store (
snprintf (filename, sizeof(filename), "%s/ringid_%s",
get_run_dir(), totemip_print (addr));
- fd = open (filename, O_WRONLY, 0777);
+ fd = open (filename, O_WRONLY, 0);
if (fd == -1) {
- fd = open (filename, O_CREAT|O_RDWR, 0777);
+ fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
}
if (fd == -1) {
LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR,
- "Couldn't store new ring id %llx to stable storage",
- memb_ring_id->seq);
-
+ "Couldn't open %s: %s",
+ filename, strerror(errno));
corosync_exit_error (AIS_DONE_STORE_RINGID);
}
log_printf (LOGSYS_LEVEL_DEBUG,
@@ -1060,8 +1059,8 @@ static void corosync_ring_id_store (
close (fd);
if (res != sizeof(memb_ring_id->seq)) {
LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR,
- "Couldn't store new ring id %llx to stable storage",
- memb_ring_id->seq);
+ "Couldn't store new ring id %llx in %s: %s",
+ memb_ring_id->seq, filename, strerror(errno));
corosync_exit_error (AIS_DONE_STORE_RINGID);
}
--
1.7.12.4
--- End Message ---
--- Begin Message ---
- Subject: Antw: [ClusterLabs] RRP works "in totally different way then most of people expects"
- From: "Ulrich Windl" <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 08 Jun 2015 13:04:48 +0200
Hi!
Maybe we want "O_DSYNC" also:
--
diff --git a/exec/main.c b/exec/main.c
index 6c9e1b6..9be33e4 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -1043,9 +1043,9 @@ static void corosync_ring_id_store (
snprintf (filename, sizeof(filename), "%s/ringid_%s",
get_run_dir(), totemip_print (addr));
- fd = open (filename, O_WRONLY, 0);
+ fd = open (filename, O_WRONLY|O_DSYNC, 0);
if (fd == -1) {
- fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
+ fd = open (filename, O_CREAT|O_RDWR|O_DSYNC, S_IRUSR|S_IWUSR);
}
if (fd == -1) {
LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR,
--
Ulrich
--- End Message ---
>From 0fcf0cfacf5a60f3d84d2407fa917bd72b3b24ca Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Jun 2015 08:39:02 +0200
Subject: [PATCH 2/3] Fix open flags when creating ringid file
exec/main.c (corosync_ring_id_create_or_load): Use zero flags when
opening file O_RDONLY. Use symbolic permissions when creating ring id
file, dropping useless execute bit. Questin change of umask when creating
file.
---
exec/main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/exec/main.c b/exec/main.c
index 6c9e1b6..9cc0351 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -996,7 +996,7 @@ static void corosync_ring_id_create_or_load (
snprintf (filename, sizeof(filename), "%s/ringid_%s",
get_run_dir(), totemip_print (addr));
- fd = open (filename, O_RDONLY, 0700);
+ fd = open (filename, O_RDONLY, 0);
/*
* If file can be opened and read, read the ring id
*/
@@ -1009,8 +1009,8 @@ static void corosync_ring_id_create_or_load (
*/
if ((fd == -1) || (res != sizeof (uint64_t))) {
memb_ring_id->seq = 0;
- umask(0);
- fd = open (filename, O_CREAT|O_RDWR, 0700);
+ umask(0); /* why that? */
+ fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
if (fd != -1) {
res = write (fd, &memb_ring_id->seq, sizeof (uint64_t));
close (fd);
--
1.7.12.4
>From 5a29ecee27d7fa71d7871c3c8d907dad61572fca Mon Sep 17 00:00:00 2001
From: Ulrich Windl <Ulrich.Windl@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Jun 2015 08:44:57 +0200
Subject: [PATCH 3/3] exec/main.c: Improve comments and log messages
exec/main.c (corosync_ring_id_create_or_load): Improve comments. Add log
message when ring id is reset to zero. Change wording of error messages
when new ringid file could not be created.
---
exec/main.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/exec/main.c b/exec/main.c
index 9cc0351..8734507 100644
--- a/exec/main.c
+++ b/exec/main.c
@@ -1005,9 +1005,11 @@ static void corosync_ring_id_create_or_load (
close (fd);
}
/*
- * If file could not be opened or read, create a new ring id
+ * If file could not be opened or read, reset ring id
*/
if ((fd == -1) || (res != sizeof (uint64_t))) {
+ log_printf (LOGSYS_LEVEL_NOTICE, "Resetting ring id in %s\n",
+ filename);
memb_ring_id->seq = 0;
umask(0); /* why that? */
fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR);
@@ -1016,13 +1018,15 @@ static void corosync_ring_id_create_or_load (
close (fd);
if (res == -1) {
LOGSYS_PERROR (errno, LOGSYS_LEVEL_ERROR,
- "Couldn't write ringid file '%s'", filename);
+ "Couldn't write initial ringid file '%s'",
+ filename);
corosync_exit_error (AIS_DONE_STORE_RINGID);
}
} else {
LOGSYS_PERROR (errno, LOGSYS_LEVEL_ERROR,
- "Couldn't create ringid file '%s'", filename);
+ "Couldn't create ringid file '%s': %s",
+ filename, strerror(errno));
corosync_exit_error (AIS_DONE_STORE_RINGID);
}
--
1.7.12.4
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss