On Tue, 2010-03-02 at 19:30 +0200, Michael Goldish wrote: > This patch: > > - Makes kvm_config use less memory during parsing, by storing config data > compactly in arrays during parsing, and generating the final dicts only when > requested. > On my machine this results in 5-10 times less memory being used (depending on > the size of the final generated list). > This allows the test configuration to keep expanding without having the > parser run out of memory. > > - Adds config.fork_and_parse(), a function that parses a config file/string in > a forked process and then terminates the process. This works around Python's > policy of keeping allocated memory to itself even after the objects occupying > the memory have been destroyed. If the process that does the parsing is the > same one that runs the tests, less memory will be available to the VMs during > testing. > > - Makes parsing 4-5 times faster as a result of the new internal representation. > > Overall, kvm_config's memory usage should now be negligible in most cases. > > Changes from v3: > - Use the homemade 'configreader' class instead of regular files in parse() > and parse_variants() (readline() and/or seek() are very slow). > - Use a regex cache dict (regex_cache). > - Use a string cache dict in addition to the list (object_cache_indices). > - Some changes to fork_and_parse() (disable buffering). > > Changes from v2: > - Merged _get_next_line() and _get_next_line_indent(). > - Made _array_get_name() faster. > > Changes from v1: > - Added config.get_generator() which is similar to get_list() but returns a > dict generator instead of a list. This should save some more memory and will > make tests start sooner. > - Use get_generator() in control. > - Call waitpid() at the end of fork_and_parse(). As the generated patch is kinda fragmented for posting comments inline, I am going to throw just a block of minor comments after I have reviewed the code: Observations: * When a file is missing, it's more appropriate to raise a IOError than an Exception, so we must change that. Also, it's important to follow the coding standards for raising exceptions. • I was wondering whether make fork_and_parse a public interface for the config object was the right decision, maybe all calls to parse_file should be done in a fork_and_parse fashion? I guess I got your point in making it a public interface and separate it from parse_file, but isn't that kinda confusing for the users (I mean, people writing control files for kvm autotest)? • About buffering on fork_and_parse: The performance penalties in disabling buffering varies, with caches dropped it was something like 3-5%, after 'warming up' it was something like 8-11%, so it's small stuff. But we can favour speed in this case so the final version won't disable buffering. Compliments: • The configreader class was a very interesting move, simple, clean and fast. Congrats! • The output of the config system is good for debugging purposes, so we'll stick with it. • Thank you very much for your work, now we have faster parsing, that consumes a lot less memory, so smaller boxes will benefit a *lot* from that. What I am going to do: • I will re-send the version with the tiny changes I made so it gets recorded on patchwork, and soon after I'll apply it upstream. I think from this point on we might have only minor tweaks to make. > > Signed-off-by: Michael Goldish <mgoldish@xxxxxxxxxx> > --- > client/tests/kvm/control | 30 +- > client/tests/kvm/control.parallel | 21 +- > client/tests/kvm/kvm_config.py | 832 ++++++++++++++++++++++--------------- > 3 files changed, 535 insertions(+), 348 deletions(-) > > diff --git a/client/tests/kvm/control b/client/tests/kvm/control > index 163286e..15c4539 100644 > --- a/client/tests/kvm/control > +++ b/client/tests/kvm/control > @@ -30,34 +30,38 @@ import kvm_utils, kvm_config > # set English environment (command output might be localized, need to be safe) > os.environ['LANG'] = 'en_US.UTF-8' > > -build_cfg_path = os.path.join(kvm_test_dir, "build.cfg") > -build_cfg = kvm_config.config(build_cfg_path) > -# Make any desired changes to the build configuration here. For example: > -#build_cfg.parse_string(""" > +str = """ > +# This string will be parsed after build.cfg. Make any desired changes to the > +# build configuration here. For example: > #release_tag = 84 > -#""") > -if not kvm_utils.run_tests(build_cfg.get_list(), job): > +""" > +build_cfg = kvm_config.config() > +build_cfg_path = os.path.join(kvm_test_dir, "build.cfg") > +build_cfg.fork_and_parse(build_cfg_path, str) > +if not kvm_utils.run_tests(build_cfg.get_generator(), job): > logging.error("KVM build step failed, exiting.") > sys.exit(1) > > -tests_cfg_path = os.path.join(kvm_test_dir, "tests.cfg") > -tests_cfg = kvm_config.config(tests_cfg_path) > -# Make any desired changes to the test configuration here. For example: > -#tests_cfg.parse_string(""" > +str = """ > +# This string will be parsed after tests.cfg. Make any desired changes to the > +# test configuration here. For example: > #display = sdl > #install|setup: timeout_multiplier = 3 > -#""") > +""" > +tests_cfg = kvm_config.config() > +tests_cfg_path = os.path.join(kvm_test_dir, "tests.cfg") > +tests_cfg.fork_and_parse(tests_cfg_path, str) > > pools_cfg_path = os.path.join(kvm_test_dir, "address_pools.cfg") > tests_cfg.parse_file(pools_cfg_path) > hostname = os.uname()[1].split(".")[0] > -if tests_cfg.filter("^" + hostname): > +if tests_cfg.count("^" + hostname): > tests_cfg.parse_string("only ^%s" % hostname) > else: > tests_cfg.parse_string("only ^default_host") > > # Run the tests > -kvm_utils.run_tests(tests_cfg.get_list(), job) > +kvm_utils.run_tests(tests_cfg.get_generator(), job) > > # Generate a nice HTML report inside the job's results dir > kvm_utils.create_report(kvm_test_dir, job.resultdir) > diff --git a/client/tests/kvm/control.parallel b/client/tests/kvm/control.parallel > index 343f694..07bc6e5 100644 > --- a/client/tests/kvm/control.parallel > +++ b/client/tests/kvm/control.parallel > @@ -160,19 +160,22 @@ if not params.get("mode") == "noinstall": > # ---------------------------------------------------------- > import kvm_config > > -filename = os.path.join(pwd, "kvm_tests.cfg") > -cfg = kvm_config.config(filename) > - > -# If desirable, make changes to the test configuration here. For example: > -# cfg.parse_string("install|setup: timeout_multiplier = 2") > -# cfg.parse_string("only fc8_quick") > -# cfg.parse_string("display = sdl") > +str = """ > +# This string will be parsed after tests.cfg. Make any desired changes to the > +# test configuration here. For example: > +#install|setup: timeout_multiplier = 3 > +#only fc8_quick > +#display = sdl > +""" > +cfg = kvm_config.config() > +filename = os.path.join(pwd, "tests.cfg") > +cfg.fork_and_parse(filename, str) > > -filename = os.path.join(pwd, "kvm_address_pools.cfg") > +filename = os.path.join(pwd, "address_pools.cfg") > if os.path.exists(filename): > cfg.parse_file(filename) > hostname = os.uname()[1].split(".")[0] > - if cfg.filter("^" + hostname): > + if cfg.count("^" + hostname): > cfg.parse_string("only ^%s" % hostname) > else: > cfg.parse_string("only ^default_host") > diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py > index 798ef56..7ff28e4 100755 > --- a/client/tests/kvm/kvm_config.py > +++ b/client/tests/kvm/kvm_config.py > @@ -2,10 +2,10 @@ > """ > KVM configuration file utility functions. > > -@copyright: Red Hat 2008-2009 > +@copyright: Red Hat 2008-2010 > """ > > -import logging, re, os, sys, StringIO, optparse > +import logging, re, os, sys, optparse, array, traceback, cPickle > import common > from autotest_lib.client.common_lib import error > from autotest_lib.client.common_lib import logging_config, logging_manager > @@ -21,490 +21,670 @@ class config: > """ > Parse an input file or string that follows the KVM Test Config File format > and generate a list of dicts that will be later used as configuration > - parameters by the the KVM tests. > + parameters by the KVM tests. > > @see: http://www.linux-kvm.org/page/KVM-Autotest/Test_Config_File > """ > > - def __init__(self, filename=None, debug=False): > + def __init__(self, filename=None, debug=True): > """ > - Initialize the list and optionally parse filename. > + Initialize the list and optionally parse a file. > > @param filename: Path of the file that will be taken. > - @param debug: Whether to turn debugging output. > + @param debug: Whether to turn on debugging output. > """ > - self.list = [{"name": "", "shortname": "", "depend": []}] > - self.debug = debug > + self.list = [array.array("H", [4, 4, 4, 4])] > + self.object_cache = [] > + self.object_cache_indices = {} > + self.regex_cache = {} > self.filename = filename > + self.debug = debug > if filename: > self.parse_file(filename) > > > def parse_file(self, filename): > """ > - Parse filename, return the resulting list and store it in .list. If > - filename does not exist, raise an exception. > + Parse file. If it doesn't exist, raise an exception. > > @param filename: Path of the configuration file. > """ > if not os.path.exists(filename): > raise Exception, "File %s not found" % filename > self.filename = filename > - file = open(filename, "r") > - self.list = self.parse(file, self.list) > - file.close() > - return self.list > + str = open(filename).read() > + self.list = self.parse(configreader(str), self.list) > > > def parse_string(self, str): > """ > - Parse a string, return the resulting list and store it in .list. > + Parse a string. > > - @param str: String that will be parsed. > + @param str: String to parse. > """ > - file = StringIO.StringIO(str) > - self.list = self.parse(file, self.list) > - file.close() > - return self.list > + self.list = self.parse(configreader(str), self.list) > > > - def get_list(self): > - """ > - Return the list of dictionaries. This should probably be called after > - parsing something. > + def fork_and_parse(self, filename=None, str=None): > """ > - return self.list > + Parse a file and/or a string in a separate process to save memory. > > + Python likes to keep memory to itself even after the objects occupying > + it have been destroyed. If during a call to parse_file() or > + parse_string() a lot of memory is used, it can only be freed by > + terminating the process. This function works around the problem by > + doing the parsing in a forked process and then terminating it, freeing > + any unneeded memory. > > - def match(self, filter, dict): > - """ > - Return True if dict matches filter. > + Note: if an exception is raised during parsing, its information will be > + printed, and the resulting list will be empty. The exception will not > + be raised in the process calling this function. > > - @param filter: A regular expression that defines the filter. > - @param dict: Dictionary that will be inspected. > + @param filename: Path of file to parse (optional). > + @param str: String to parse (optional). > """ > - filter = re.compile(r"(\.|^)(%s)(\.|$)" % filter) > - return bool(filter.search(dict["name"])) > - > - > - def filter(self, filter, list=None): > + r, w = os.pipe() > + r, w = os.fdopen(r, "r", 0), os.fdopen(w, "w", 0) > + pid = os.fork() > + if not pid: > + # Child process > + r.close() > + try: > + if filename: > + self.parse_file(filename) > + if str: > + self.parse_string(str) > + except: > + traceback.print_exc() > + self.list = [] > + # Convert the arrays to strings before pickling because at least > + # some Python versions can't pickle/unpickle arrays > + l = [a.tostring() for a in self.list] > + cPickle.dump((l, self.object_cache), w, -1) > + w.close() > + os._exit(0) > + else: > + # Parent process > + w.close() > + (l, self.object_cache) = cPickle.load(r) > + r.close() > + os.waitpid(pid, 0) > + self.list = [] > + for s in l: > + a = array.array("H") > + a.fromstring(s) > + self.list.append(a) > + > + > + def get_generator(self): > """ > - Filter a list of dicts. > + Generate dictionaries from the code parsed so far. This should > + probably be called after parsing something. > > - @param filter: A regular expression that will be used as a filter. > - @param list: A list of dictionaries that will be filtered. > + @return: A dict generator. > """ > - if list is None: > - list = self.list > - return [dict for dict in list if self.match(filter, dict)] > + for a in self.list: > + name, shortname, depend, content = _array_get_all(a, self.object_cache) > + dict = {"name": name, "shortname": shortname, "depend": depend} > + self._apply_content_to_dict(dict, content) > + yield dict > > > - def split_and_strip(self, str, sep="="): > + def get_list(self): > """ > - Split str and strip quotes from the resulting parts. > + Generate a list of dictionaries from the code parsed so far. > + This should probably be called after parsing something. > > - @param str: String that will be processed > - @param sep: Separator that will be used to split the string > + @return: A list of dicts. > """ > - temp = str.split(sep, 1) > - for i in range(len(temp)): > - temp[i] = temp[i].strip() > - if re.findall("^\".*\"$", temp[i]): > - temp[i] = temp[i].strip("\"") > - elif re.findall("^\'.*\'$", temp[i]): > - temp[i] = temp[i].strip("\'") > - return temp > - > + return list(self.get_generator()) > > - def get_next_line(self, file): > - """ > - Get the next non-empty, non-comment line in a file like object. > > - @param file: File like object > - @return: If no line is available, return None. > + def count(self, filter=".*"): > """ > - while True: > - line = file.readline() > - if line == "": return None > - stripped_line = line.strip() > - if len(stripped_line) > 0 \ > - and not stripped_line.startswith('#') \ > - and not stripped_line.startswith('//'): > - return line > - > + Return the number of dictionaries whose names match filter. > > - def get_next_line_indent(self, file): > + @param filter: A regular expression string. > """ > - Return the indent level of the next non-empty, non-comment line in file. > - > - @param file: File like object. > - @return: If no line is available, return -1. > - """ > - pos = file.tell() > - line = self.get_next_line(file) > - if not line: > - file.seek(pos) > - return -1 > - line = line.expandtabs() > - indent = 0 > - while line[indent] == ' ': > - indent += 1 > - file.seek(pos) > - return indent > - > - > - def add_name(self, str, name, append=False): > - """ > - Add name to str with a separator dot and return the result. > - > - @param str: String that will be processed > - @param name: name that will be appended to the string. > - @return: If append is True, append name to str. > - Otherwise, pre-pend name to str. > - """ > - if str == "": > - return name > - # Append? > - elif append: > - return str + "." + name > - # Prepend? > - else: > - return name + "." + str > + exp = self._get_filter_regex(filter) > + count = 0 > + for a in self.list: > + name = _array_get_name(a, self.object_cache) > + if exp.search(name): > + count += 1 > + return count > > > - def parse_variants(self, file, list, subvariants=False, prev_indent=-1): > + def parse_variants(self, cr, list, subvariants=False, prev_indent=-1): > """ > - Read and parse lines from file like object until a line with an indent > - level lower than or equal to prev_indent is encountered. > + Read and parse lines from a configreader object until a line with an > + indent level lower than or equal to prev_indent is encountered. > > - @brief: Parse a 'variants' or 'subvariants' block from a file-like > - object. > - @param file: File-like object that will be parsed > - @param list: List of dicts to operate on > + @brief: Parse a 'variants' or 'subvariants' block from a configreader > + object. > + @param cr: configreader object to be parsed. > + @param list: List of arrays to operate on. > @param subvariants: If True, parse in 'subvariants' mode; > - otherwise parse in 'variants' mode > - @param prev_indent: The indent level of the "parent" block > - @return: The resulting list of dicts. > + otherwise parse in 'variants' mode. > + @param prev_indent: The indent level of the "parent" block. > + @return: The resulting list of arrays. > """ > new_list = [] > > while True: > - indent = self.get_next_line_indent(file) > + pos = cr.tell() > + (indented_line, line, indent) = cr.get_next_line() > if indent <= prev_indent: > + cr.seek(pos) > break > - indented_line = self.get_next_line(file).rstrip() > - line = indented_line.strip() > > # Get name and dependencies > - temp = line.strip("- ").split(":") > - name = temp[0] > - if len(temp) == 1: > - dep_list = [] > - else: > - dep_list = temp[1].split() > + (name, depend) = map(str.strip, line.lstrip("- ").split(":")) > > # See if name should be added to the 'shortname' field > - add_to_shortname = True > - if name.startswith("@"): > - name = name.strip("@") > - add_to_shortname = False > - > - # Make a deep copy of list > - temp_list = [] > - for dict in list: > - new_dict = dict.copy() > - new_dict["depend"] = dict["depend"][:] > - temp_list.append(new_dict) > + add_to_shortname = not name.startswith("@") > + name = name.lstrip("@") > + > + # Store name and dependencies in cache and get their indices > + n = self._store_str(name) > + d = self._store_str(depend) > + > + # Make a copy of list > + temp_list = [a[:] for a in list] > > if subvariants: > # If we're parsing 'subvariants', first modify the list > - self.__modify_list_subvariants(temp_list, name, dep_list, > - add_to_shortname) > - temp_list = self.parse(file, temp_list, > - restricted=True, prev_indent=indent) > + if add_to_shortname: > + for a in temp_list: > + _array_append_to_name_shortname_depend(a, n, d) > + else: > + for a in temp_list: > + _array_append_to_name_depend(a, n, d) > + temp_list = self.parse(cr, temp_list, restricted=True, > + prev_indent=indent) > else: > # If we're parsing 'variants', parse before modifying the list > if self.debug: > - self.__debug_print(indented_line, > - "Entering variant '%s' " > - "(variant inherits %d dicts)" % > - (name, len(list))) > - temp_list = self.parse(file, temp_list, > - restricted=False, prev_indent=indent) > - self.__modify_list_variants(temp_list, name, dep_list, > - add_to_shortname) > + _debug_print(indented_line, > + "Entering variant '%s' " > + "(variant inherits %d dicts)" % > + (name, len(list))) > + temp_list = self.parse(cr, temp_list, restricted=False, > + prev_indent=indent) > + if add_to_shortname: > + for a in temp_list: > + _array_prepend_to_name_shortname_depend(a, n, d) > + else: > + for a in temp_list: > + _array_prepend_to_name_depend(a, n, d) > > new_list += temp_list > > return new_list > > > - def parse(self, file, list, restricted=False, prev_indent=-1): > + def parse(self, cr, list, restricted=False, prev_indent=-1): > """ > - Read and parse lines from file until a line with an indent level lower > - than or equal to prev_indent is encountered. > - > - @brief: Parse a file-like object. > - @param file: A file-like object > - @param list: A list of dicts to operate on (list is modified in > - place and should not be used after the call) > - @param restricted: if True, operate in restricted mode > - (prohibit 'variants') > - @param prev_indent: the indent level of the "parent" block > - @return: Return the resulting list of dicts. > + Read and parse lines from a configreader object until a line with an > + indent level lower than or equal to prev_indent is encountered. > + > + @brief: Parse a configreader object. > + @param cr: A configreader object. > + @param list: A list of arrays to operate on (list is modified in > + place and should not be used after the call). > + @param restricted: If True, operate in restricted mode > + (prohibit 'variants'). > + @param prev_indent: The indent level of the "parent" block. > + @return: The resulting list of arrays. > @note: List is destroyed and should not be used after the call. > - Only the returned list should be used. > + Only the returned list should be used. > """ > + current_block = "" > + > while True: > - indent = self.get_next_line_indent(file) > + pos = cr.tell() > + (indented_line, line, indent) = cr.get_next_line() > if indent <= prev_indent: > + cr.seek(pos) > + self._append_content_to_arrays(list, current_block) > break > - indented_line = self.get_next_line(file).rstrip() > - line = indented_line.strip() > - words = line.split() > > len_list = len(list) > > - # Look for a known operator in the line > - operators = ["?+=", "?<=", "?=", "+=", "<=", "="] > - op_found = None > - op_pos = len(line) > - for op in operators: > - pos = line.find(op) > - if pos >= 0 and pos < op_pos: > - op_found = op > - op_pos = pos > - > - # Found an operator? > - if op_found: > + # Parse assignment operators (keep lines in temporary buffer) > + if "=" in line: > if self.debug and not restricted: > - self.__debug_print(indented_line, > - "Parsing operator (%d dicts in current " > - "context)" % len_list) > - (left, value) = self.split_and_strip(line, op_found) > - filters_and_key = self.split_and_strip(left, ":") > - filters = filters_and_key[:-1] > - key = filters_and_key[-1] > - filtered_list = list > - for filter in filters: > - filtered_list = self.filter(filter, filtered_list) > - # Apply the operation to the filtered list > - if op_found == "=": > - for dict in filtered_list: > - dict[key] = value > - elif op_found == "+=": > - for dict in filtered_list: > - dict[key] = dict.get(key, "") + value > - elif op_found == "<=": > - for dict in filtered_list: > - dict[key] = value + dict.get(key, "") > - elif op_found.startswith("?"): > - exp = re.compile("^(%s)$" % key) > - if op_found == "?=": > - for dict in filtered_list: > - for key in dict.keys(): > - if exp.match(key): > - dict[key] = value > - elif op_found == "?+=": > - for dict in filtered_list: > - for key in dict.keys(): > - if exp.match(key): > - dict[key] = dict.get(key, "") + value > - elif op_found == "?<=": > - for dict in filtered_list: > - for key in dict.keys(): > - if exp.match(key): > - dict[key] = value + dict.get(key, "") > + _debug_print(indented_line, > + "Parsing operator (%d dicts in current " > + "context)" % len_list) > + current_block += line + "\n" > + continue > + > + # Flush the temporary buffer > + self._append_content_to_arrays(list, current_block) > + current_block = "" > + > + words = line.split() > > # Parse 'no' and 'only' statements > - elif words[0] == "no" or words[0] == "only": > + if words[0] == "no" or words[0] == "only": > if len(words) <= 1: > continue > - filters = words[1:] > + filters = map(self._get_filter_regex, words[1:]) > filtered_list = [] > if words[0] == "no": > - for dict in list: > + for a in list: > + name = _array_get_name(a, self.object_cache) > for filter in filters: > - if self.match(filter, dict): > + if filter.search(name): > break > else: > - filtered_list.append(dict) > + filtered_list.append(a) > if words[0] == "only": > - for dict in list: > + for a in list: > + name = _array_get_name(a, self.object_cache) > for filter in filters: > - if self.match(filter, dict): > - filtered_list.append(dict) > + if filter.search(name): > + filtered_list.append(a) > break > list = filtered_list > if self.debug and not restricted: > - self.__debug_print(indented_line, > - "Parsing no/only (%d dicts in current " > - "context, %d remain)" % > - (len_list, len(list))) > + _debug_print(indented_line, > + "Parsing no/only (%d dicts in current " > + "context, %d remain)" % > + (len_list, len(list))) > + continue > > # Parse 'variants' > - elif line == "variants:": > + if line == "variants:": > # 'variants' not allowed in restricted mode > # (inside an exception or inside subvariants) > if restricted: > e_msg = "Using variants in this context is not allowed" > raise error.AutotestError(e_msg) > if self.debug and not restricted: > - self.__debug_print(indented_line, > - "Entering variants block (%d dicts in " > - "current context)" % len_list) > - list = self.parse_variants(file, list, subvariants=False, > + _debug_print(indented_line, > + "Entering variants block (%d dicts in " > + "current context)" % len_list) > + list = self.parse_variants(cr, list, subvariants=False, > prev_indent=indent) > + continue > > # Parse 'subvariants' (the block is parsed for each dict > # separately) > - elif line == "subvariants:": > + if line == "subvariants:": > if self.debug and not restricted: > - self.__debug_print(indented_line, > - "Entering subvariants block (%d dicts in " > - "current context)" % len_list) > + _debug_print(indented_line, > + "Entering subvariants block (%d dicts in " > + "current context)" % len_list) > new_list = [] > - # Remember current file position > - pos = file.tell() > + # Remember current position > + pos = cr.tell() > # Read the lines in any case > - self.parse_variants(file, [], subvariants=True, > + self.parse_variants(cr, [], subvariants=True, > prev_indent=indent) > # Iterate over the list... > - for index in range(len(list)): > - # Revert to initial file position in this 'subvariants' > - # block > - file.seek(pos) > + for index in xrange(len(list)): > + # Revert to initial position in this 'subvariants' block > + cr.seek(pos) > # Everything inside 'subvariants' should be parsed in > # restricted mode > - new_list += self.parse_variants(file, list[index:index+1], > + new_list += self.parse_variants(cr, list[index:index+1], > subvariants=True, > prev_indent=indent) > list = new_list > + continue > > # Parse 'include' statements > - elif words[0] == "include": > + if words[0] == "include": > if len(words) <= 1: > continue > if self.debug and not restricted: > - self.__debug_print(indented_line, > - "Entering file %s" % words[1]) > + _debug_print(indented_line, "Entering file %s" % words[1]) > if self.filename: > filename = os.path.join(os.path.dirname(self.filename), > words[1]) > if os.path.exists(filename): > - new_file = open(filename, "r") > - list = self.parse(new_file, list, restricted) > - new_file.close() > + str = open(filename).read() > + list = self.parse(configreader(str), list, restricted) > if self.debug and not restricted: > - self.__debug_print("", "Leaving file %s" % words[1]) > + _debug_print("", "Leaving file %s" % words[1]) > else: > logging.warning("Cannot include %s -- file not found", > filename) > else: > logging.warning("Cannot include %s because no file is " > "currently open", words[1]) > + continue > > # Parse multi-line exceptions > # (the block is parsed for each dict separately) > - elif line.endswith(":"): > + if line.endswith(":"): > if self.debug and not restricted: > - self.__debug_print(indented_line, > - "Entering multi-line exception block " > - "(%d dicts in current context outside " > - "exception)" % len_list) > - line = line.strip(":") > + _debug_print(indented_line, > + "Entering multi-line exception block " > + "(%d dicts in current context outside " > + "exception)" % len_list) > + line = line[:-1] > new_list = [] > - # Remember current file position > - pos = file.tell() > + # Remember current position > + pos = cr.tell() > # Read the lines in any case > - self.parse(file, [], restricted=True, prev_indent=indent) > + self.parse(cr, [], restricted=True, prev_indent=indent) > # Iterate over the list... > - for index in range(len(list)): > - if self.match(line, list[index]): > - # Revert to initial file position in this > - # exception block > - file.seek(pos) > + exp = self._get_filter_regex(line) > + for index in xrange(len(list)): > + name = _array_get_name(list[index], self.object_cache) > + if exp.search(name): > + # Revert to initial position in this exception block > + cr.seek(pos) > # Everything inside an exception should be parsed in > # restricted mode > - new_list += self.parse(file, list[index:index+1], > + new_list += self.parse(cr, list[index:index+1], > restricted=True, > prev_indent=indent) > else: > - new_list += list[index:index+1] > + new_list.append(list[index]) > list = new_list > + continue > > return list > > > - def __debug_print(self, str1, str2=""): > + def _get_filter_regex(self, filter): > """ > - Nicely print two strings and an arrow. > + Return a regex object corresponding to a given filter string. > > - @param str1: First string > - @param str2: Second string > + All regular expressions given to the parser are passed through this > + function first. Its purpose is to make them more specific and better > + suited to match dictionary names: it forces simple expressions to match > + only between dots or at the beginning or end of a string. For example, > + the filter 'foo' will match 'foo.bar' but not 'foobar'. > """ > - if str2: > - str = "%-50s ---> %s" % (str1, str2) > - else: > - str = str1 > - logging.debug(str) > - > - > - def __modify_list_variants(self, list, name, dep_list, add_to_shortname): > - """ > - Make some modifications to list, as part of parsing a 'variants' block. > - > - @param list: List to be processed > - @param name: Name to be prepended to the dictionary's 'name' key > - @param dep_list: List of dependencies to be added to the dictionary's > - 'depend' key > - @param add_to_shortname: Boolean indicating whether name should be > - prepended to the dictionary's 'shortname' key as well > - """ > - for dict in list: > - # Prepend name to the dict's 'name' field > - dict["name"] = self.add_name(dict["name"], name) > - # Prepend name to the dict's 'shortname' field > - if add_to_shortname: > - dict["shortname"] = self.add_name(dict["shortname"], name) > - # Prepend name to each of the dict's dependencies > - for i in range(len(dict["depend"])): > - dict["depend"][i] = self.add_name(dict["depend"][i], name) > - # Add new dependencies > - dict["depend"] += dep_list > - > - > - def __modify_list_subvariants(self, list, name, dep_list, add_to_shortname): > - """ > - Make some modifications to list, as part of parsing a 'subvariants' > - block. > - > - @param list: List to be processed > - @param name: Name to be appended to the dictionary's 'name' key > - @param dep_list: List of dependencies to be added to the dictionary's > - 'depend' key > - @param add_to_shortname: Boolean indicating whether name should be > - appended to the dictionary's 'shortname' as well > - """ > - for dict in list: > - # Add new dependencies > - for dep in dep_list: > - dep_name = self.add_name(dict["name"], dep, append=True) > - dict["depend"].append(dep_name) > - # Append name to the dict's 'name' field > - dict["name"] = self.add_name(dict["name"], name, append=True) > - # Append name to the dict's 'shortname' field > - if add_to_shortname: > - dict["shortname"] = self.add_name(dict["shortname"], name, > - append=True) > + try: > + return self.regex_cache[filter] > + except KeyError: > + exp = re.compile(r"(\.|^)(%s)(\.|$)" % filter) > + self.regex_cache[filter] = exp > + return exp > + > + > + def _store_str(self, str): > + """ > + Store str in the internal object cache, if it isn't already there, and > + return its identifying index. > + > + @param str: String to store. > + @return: The index of str in the object cache. > + """ > + try: > + return self.object_cache_indices[str] > + except KeyError: > + self.object_cache.append(str) > + index = len(self.object_cache) - 1 > + self.object_cache_indices[str] = index > + return index > + > + > + def _append_content_to_arrays(self, list, content): > + """ > + Append content (config code containing assignment operations) to a list > + of arrays. > + > + @param list: List of arrays to operate on. > + @param content: String containing assignment operations. > + """ > + if content: > + str_index = self._store_str(content) > + for a in list: > + _array_append_to_content(a, str_index) > + > + > + def _apply_content_to_dict(self, dict, content): > + """ > + Apply the operations in content (config code containing assignment > + operations) to a dict. > + > + @param dict: Dictionary to operate on. Must have 'name' key. > + @param content: String containing assignment operations. > + """ > + for line in content.splitlines(): > + op_found = None > + op_pos = len(line) > + for op in ops: > + pos = line.find(op) > + if pos >= 0 and pos < op_pos: > + op_found = op > + op_pos = pos > + if not op_found: > + continue > + (left, value) = map(str.strip, line.split(op_found, 1)) > + if value and ((value[0] == '"' and value[-1] == '"') or > + (value[0] == "'" and value[-1] == "'")): > + value = value[1:-1] > + filters_and_key = map(str.strip, left.split(":")) > + filters = filters_and_key[:-1] > + key = filters_and_key[-1] > + for filter in filters: > + exp = self._get_filter_regex(filter) > + if not exp.search(dict["name"]): > + break > + else: > + ops[op_found](dict, key, value) > + > + > +# Assignment operators > + > +def _op_set(dict, key, value): > + dict[key] = value > + > + > +def _op_append(dict, key, value): > + dict[key] = dict.get(key, "") + value > + > + > +def _op_prepend(dict, key, value): > + dict[key] = value + dict.get(key, "") > + > + > +def _op_regex_set(dict, exp, value): > + exp = re.compile("^(%s)$" % exp) > + for key in dict: > + if exp.match(key): > + dict[key] = value > + > + > +def _op_regex_append(dict, exp, value): > + exp = re.compile("^(%s)$" % exp) > + for key in dict: > + if exp.match(key): > + dict[key] += value > + > + > +def _op_regex_prepend(dict, exp, value): > + exp = re.compile("^(%s)$" % exp) > + for key in dict: > + if exp.match(key): > + dict[key] = value + dict[key] > + > + > +ops = { > + "=": _op_set, > + "+=": _op_append, > + "<=": _op_prepend, > + "?=": _op_regex_set, > + "?+=": _op_regex_append, > + "?<=": _op_regex_prepend, > +} > + > + > +# Misc functions > + > +def _debug_print(str1, str2=""): > + """ > + Nicely print two strings and an arrow. > + > + @param str1: First string. > + @param str2: Second string. > + """ > + if str2: > + str = "%-50s ---> %s" % (str1, str2) > + else: > + str = str1 > + logging.debug(str) > + > + > +# configreader > + > +class configreader: > + """ > + Preprocess an input string and provide file-like services. > + This is intended as a replacement for the file and StringIO classes, > + whose readline() and/or seek() methods seem to be slow. > + """ > + > + def __init__(self, str): > + """ > + Initialize the reader. > + > + @param str: The string to parse. > + """ > + self.line_index = 0 > + self.lines = [] > + for line in str.splitlines(): > + line = line.rstrip().expandtabs() > + stripped_line = line.strip() > + indent = len(line) - len(stripped_line) > + if (not stripped_line > + or stripped_line.startswith("#") > + or stripped_line.startswith("//")): > + continue > + self.lines.append((line, stripped_line, indent)) > + > + > + def get_next_line(self): > + """ > + Get the next non-empty, non-comment line in the string. > + > + @param file: File like object. > + @return: (line, stripped_line, indent), where indent is the line's > + indent level or -1 if no line is available. > + """ > + try: > + if self.line_index < len(self.lines): > + return self.lines[self.line_index] > + else: > + return (None, None, -1) > + finally: > + self.line_index += 1 > + > + > + def tell(self): > + """ > + Return the current line index. > + """ > + return self.line_index > + > + > + def seek(self, index): > + """ > + Set the current line index. > + """ > + self.line_index = index > + > + > +# Array structure: > +# ---------------- > +# The first 4 elements contain the indices of the 4 segments. > +# a[0] -- Index of beginning of 'name' segment (always 4). > +# a[1] -- Index of beginning of 'shortname' segment. > +# a[2] -- Index of beginning of 'depend' segment. > +# a[3] -- Index of beginning of 'content' segment. > +# The next elements in the array comprise the aforementioned segments: > +# The 'name' segment begins with a[a[0]] and ends with a[a[1]-1]. > +# The 'shortname' segment begins with a[a[1]] and ends with a[a[2]-1]. > +# The 'depend' segment begins with a[a[2]] and ends with a[a[3]-1]. > +# The 'content' segment begins with a[a[3]] and ends at the end of the array. > + > +# The following functions append/prepend to various segments of an array. > + > +def _array_append_to_name_shortname_depend(a, name, depend): > + a.insert(a[1], name) > + a.insert(a[2] + 1, name) > + a.insert(a[3] + 2, depend) > + a[1] += 1 > + a[2] += 2 > + a[3] += 3 > + > + > +def _array_prepend_to_name_shortname_depend(a, name, depend): > + a[1] += 1 > + a[2] += 2 > + a[3] += 3 > + a.insert(a[0], name) > + a.insert(a[1], name) > + a.insert(a[2], depend) > + > + > +def _array_append_to_name_depend(a, name, depend): > + a.insert(a[1], name) > + a.insert(a[3] + 1, depend) > + a[1] += 1 > + a[2] += 1 > + a[3] += 2 > + > + > +def _array_prepend_to_name_depend(a, name, depend): > + a[1] += 1 > + a[2] += 1 > + a[3] += 2 > + a.insert(a[0], name) > + a.insert(a[2], depend) > + > + > +def _array_append_to_content(a, content): > + a.append(content) > + > + > +def _array_get_name(a, object_cache): > + """ > + Return the name of a dictionary represented by a given array. > + > + @param a: Array representing a dictionary. > + @param object_cache: A list of strings referenced by elements in the array. > + """ > + return ".".join([object_cache[i] for i in a[a[0]:a[1]]]) > + > + > +def _array_get_all(a, object_cache): > + """ > + Return a 4-tuple containing all the data stored in a given array, in a > + format that is easy to turn into an actual dictionary. > + > + @param a: Array representing a dictionary. > + @param object_cache: A list of strings referenced by elements in the array. > + @return: A 4-tuple: (name, shortname, depend, content), in which all > + members are strings except depend which is a list of strings. > + """ > + name = ".".join([object_cache[i] for i in a[a[0]:a[1]]]) > + shortname = ".".join([object_cache[i] for i in a[a[1]:a[2]]]) > + content = "".join([object_cache[i] for i in a[a[3]:]]) > + depend = [] > + prefix = "" > + for n, d in zip(a[a[0]:a[1]], a[a[2]:a[3]]): > + for dep in object_cache[d].split(): > + depend.append(prefix + dep) > + prefix += object_cache[n] + "." > + return name, shortname, depend, content > + > > > if __name__ == "__main__": > parser = optparse.OptionParser() > parser.add_option('-f', '--file', dest="filename", action='store', > help='path to a config file that will be parsed. ' > - 'If not specified, will parse kvm_tests.cfg ' > - 'located inside the kvm test dir.') > + 'If not specified, will parse tests.cfg located ' > + 'inside the kvm test dir.') > parser.add_option('--verbose', dest="debug", action='store_true', > help='include debug messages in console output') > > @@ -518,9 +698,9 @@ if __name__ == "__main__": > # Here we configure the stand alone program to use the autotest > # logging system. > logging_manager.configure_logging(KvmLoggingConfig(), verbose=debug) > - list = config(filename, debug=debug).get_list() > + dicts = config(filename, debug=debug).get_generator() > i = 0 > - for dict in list: > + for dict in dicts: > logging.info("Dictionary #%d:", i) > keys = dict.keys() > keys.sort() -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html