Re: Data loss after backfill

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

Let me share current idea about source code modification.
Regarding readdir(), errno is set to indicate the error, so we can check readdir() error in Ceph codes.

[readdir() specification]
------------------------------------------------------------
https://man7.org/linux/man-pages/man3/readdir.3.html

DESCRIPTION
... It returns NULL on reaching the end of the directory stream or if an error occurred.

RETURN VALUE
... If the end of the directory stream is reached, NULL is returned
and errno is not changed. If an error occurs, NULL is returned
and errno is set to indicate the error. To distinguish end of
stream from an error, set errno to zero before calling readdir()
and then check the value of errno if NULL is returned.
------------------------------------------------------------


How about modify the codes where readdir () is being executed in filestore as follows:

------------------------------------------------------------
diff --git a/src/os/filestore/BtrfsFileStoreBackend.cc b/src/os/filestore/BtrfsFileStoreBackend.cc
index df1d2452a1f..ad0c4d1bb1e 100644
--- a/src/os/filestore/BtrfsFileStoreBackend.cc
+++ b/src/os/filestore/BtrfsFileStoreBackend.cc
@@ -326,7 +326,13 @@ int BtrfsFileStoreBackend::list_checkpoints(list<string>& ls)
   list<string> snaps;
   char path[PATH_MAX];
   struct dirent *de;
-  while ((de = ::readdir(dir))) {
+  while (true) {
+    errno = 0;
+    de = ::readdir(dir);
+    if (de == nullptr) {
+      ceph_assert(errno == 0);
+      break;
+    }
     snprintf(path, sizeof(path), "%s/%s", get_basedir_path().c_str(), de->d_name);

     struct stat st;
diff --git a/src/os/filestore/FileStore.cc b/src/os/filestore/FileStore.cc
index f059331b090..1aeec855e9a 100644
--- a/src/os/filestore/FileStore.cc
+++ b/src/os/filestore/FileStore.cc
@@ -4987,7 +4987,13 @@ int FileStore::list_collections(vector<coll_t>& ls, bool include_temp)
   }

   struct dirent *de = nullptr;
-  while ((de = ::readdir(dir))) {
+  while (true) {
+    errno = 0;
+    de = ::readdir(dir);
+    if (de == nullptr) {
+      ceph_assert(errno == 0);
+      break;
+    }
     if (de->d_type == DT_UNKNOWN) {
       // d_type not supported (non-ext[234], btrfs), must stat
       struct stat sb;
diff --git a/src/os/filestore/LFNIndex.cc b/src/os/filestore/LFNIndex.cc
index bbf65bdc66a..530d7501ddc 100644
--- a/src/os/filestore/LFNIndex.cc
+++ b/src/os/filestore/LFNIndex.cc
@@ -429,7 +429,13 @@ int LFNIndex::list_objects(const vector<string> &to_list, int max_objs,
   int r = 0;
   int listed = 0;
   bool end = true;
-  while ((de = ::readdir(dir))) {
+  while (true) {
+    errno = 0;
+    de = ::readdir(dir);
+    if (de == nullptr) {
+      ceph_assert(errno == 0);
+      break;
+    }
     end = false;
     if (max_objs > 0 && listed >= max_objs) {
       break;
@@ -477,7 +483,13 @@ int LFNIndex::list_subdirs(const vector<string> &to_list,
     return -errno;

   struct dirent *de = nullptr;
-  while ((de = ::readdir(dir))) {
+  while (true) {
+    errno = 0;
+    de = ::readdir(dir);
+    if (de == nullptr) {
+      ceph_assert(errno == 0);
+      break;
+    }
     string short_name(de->d_name);
     string demangled_name;
     if (lfn_is_subdir(short_name, &demangled_name)) {
------------------------------------------------------------

I haven't verified this modification yet, but is this idea (check readdir() errors) acceptable?
If Ceph can't detect the error, it's hard to deal with. 
But regarding readdir(), it can detect the error, so I would like to fix it.

Jin
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux