Wednesday, June 4, 2008

Parsing YAML files in Ruby - Part 2

I wish I had a clip reel I could roll after a dude with a really deep voice said, "Previously on Parsing YAML files in Ruby".  But I don't.  So here's a link.

Ruby is Narnia.  I spend my real life in c#, a perfectly serviceable language.  I'm comfortable there, I kind of know my way around, and I've come to depend on it to make my living.  But when I have a few spare moments here and there, I get to wander off into this magical fairy-land and have adventures with strange and wonderful creatures.  Like the YAML.  In case you forgot what our YAML looks like, here he is:

   1: ---
   2: shared paths:
   3:   build share : \\\Builds
   5: local paths:
   6:   references  : \references
   8: custom assemblies:
   9:   - location: \Dev\Components\Business\Core\Trunk\Latest\Debug
  10:     assemblies:
  11:       - name : MyNamespace.Core
  12:         files:
  13:           - binary    : MyNamespace.Core.dll
  14:           - debug     : MyNamespace.Core.pdb
  15:           - document  : MyNamespace.Core.xml
  17:   - location: \Dev\Components\Framework\Trunk\Debug
  18:     assemblies:
  19:       - name : MyNamespace.Framework.Core
  20:         files:
  21:           - binary    : MyNamespace.Framework.Core.dll
  22:           - debug     : MyNamespace.Framework.Core.pdb
  23:           - document  : MyNamespace.Framework.Core.xml
  25: vendor assemblies:
  26:   - location: \vendor\DotNet Commons\Logging\2.0
  27:     assemblies:
  28:       - name : Dotnet.Commons.Logging
  29:         files:
  30:           - binary    : Dotnet.Commons.Logging.dll
  32: testing assemblies:
  33:   - location: \vendor\Nunit\2.4.3
  34:     assemblies:
  35:       - name : NUnit.Framework
  36:         files:
  37:           - binary    : nunit.framework.dll
  39:   - location: \vendor\Rhino.Mocks\
  40:     assemblies:
  41:       - name : Rhino.Mocks
  42:         files:
  43:           - binary    : Rhino.Mocks.dll
  44:           - document  : Rhino.Mocks.xml
  45: ...


Last time I told you how easy it was to access data in *.yml files in Ruby.  I've taken idea that a little further, and I cooked up this class:

   1: require 'yaml'
   3: class References
   4:   attr_accessor :debug_mode
   5:   def initialize(references_file_name='references.yml',debug_mode=true)
   6:     @refs = open(references_file_name) {|f| YAML.load(f) }
   7:     @debug_mode = debug_mode
   8:   end
   9:   def shared_root_directory
  10:     @refs['shared paths']['build share']
  11:   end
  12:   def get_filenames(assembly_list_name, *file_types)
  13:     get_node(@refs, assembly_list_name) do |assembly_list|
  14:       assembly_list.each do |packing_list|
  15:         get_node(packing_list, 'assemblies'){|assembly| get_names(assembly, packing_list['location'],file_types){|filename| yield filename}}
  16:       end
  17:     end
  18:   end
  19:   private
  20:   def concatenate(*locators)
  21:     concatenated =
  22:     locators.each { |locator| concatenated << (locator =~ /\A(?!\\)/ ? '\\' : '') << locator.sub(/\\\Z/, '') }
  23:     return concatenated
  24:   end
  25:   def get_names(assembly, path,file_types)
  26:     get_node(assembly, 'files') do |file| 
  27:       parse_filenames(file,file_types){|filename| yield concatenate(shared_root_directory,path,filename)}
  28:     end
  29:   end
  30:   def get_node(data_store, find_key)
  31:     yield data_store[find_key] if data_store.kind_of? Hash
  32:     data_store.each{|node| get_node(node, find_key){|subnode| yield subnode}} if data_store.kind_of? Array
  33:   end
  34:   def parse_filenames(file_node,file_types)
  35:     file_node.keys.each {|key| yield file_node[key] unless filter(key,file_types)} if file_node.kind_of? Hash
  36:     file_node.each{|element| parse_filenames(element,file_types){|value| yield value}} if file_node.kind_of? Array
  37:   end
  38:   def filter(key,file_types)
  39:     (!@debug_mode && key=="debug") || (!file_types.include?(key) unless file_types.empty?)
  40:   end
  41: end

With this References class, you can do something like this (pay attention - here's where it starts to get cool):

refs =
refs.get_filenames("custom assemblies","binary"){|filename| puts filename}

And you get something like this:


Let's start with get_node() on line 30.  This method is an iterator.  I don't know why, but it took a long time for the lightbulb to go off in my head over Ruby's usage of the yield keyword.  Turns out, it works just like all the Ruby books say it does.  Really, why would they lie?  In this case, on line 31, we're getting the value located in an element in the data_store hash picked out by the find_key variable, and yielding that value back to the calling method.  And that calling method better have a code block to execute once it receives a value, or we're gonna get a big ol' runtime exception.  For the get_filenames() call in our script, on line 14 we're saying, "Look in the top-most hash in the references.yml file and find me a node with a key called 'custom assemblies'".  Remember: the way our YAML file is laid out, it's just a big, weird hash of arrays and hashes.  We have to write code to ferret out the info we want, and in this case ultimately we want a list of filenames.

There's another interesting thing happening on line 31.  There's an if statement at the end of the line.  If you tried to get away with something like that in c# land, they'd lock you up and throw away the key.  Ahh, but here in Narnia, animals talk, trees walk, and all sorts of silly things happen.  You can even say "hey, do this thing if this other thing is true", they way people do.  No fussy brackets, or parentheses, or overly strict formatting rules to worry about.  Line 31 takes care of the case when data_store is a hash.  If data_store is not a hash, it's an array of hashes, and we take care of that on line 32, using a little recursion magic to get at the hash in each element of the array.

I think of the contents of 'custom assemblies' as an assembly_list, and each assembly_list contains packing_lists, each with a location and a list of assemblies at that location.  Each assembly can have more than one file associated with it - in this case I've listed the binary dll, the debug symbol pdb file, and the xml document associated with our custom assemblies.  I'm asking References to get just the binary files in 'custom assemblies'.  Now that I've got the 'custom assemblies' node, I already know that the assembly_list is an array, so I can just iterate through it with .each to get each packing_list. (That's the reason I wrote get_node() in the first place - as I was learning about YAML and Ruby, I wasn't sure what object types I was dealing with as I drilled down through the YAML file.  I could probably simplify the get_node() iterator now that I understand the structure of the file better, but I'll save refactoring for another time.)   A packing_list is a hash containing a 'location' that can have multiple 'assemblies'.  Line 15 says, "Get me the filenames for every assembly in this packing_list, and I'm really only want the ones that match this list of file_types".

The * in front of file_types in the get_filenames() signature makes it an optional parameter.  You don't have to specify the type of file you're looking for, and if you don't then you get back everything.  But if you do specify a list of file_types, that list is used in the filter() method, which is called on line 35.  Another cool Ruby-ish way of saying something: using the unless keyword.  Line 35 says, "Take a look at all the keys in the file_node hash, and give me back the file_node value for each key unless the key should be filtered out."

I'm hoping to put plain old Ruby classes and YAML together with Rake so that I can sweep angle brackets out of my life forever, and I'll post my progress as I learn.  Now I'm sure that there are better ways of expressing these things in Ruby.  But I'm new here.  I'm still enjoying my Turkish Delight and hot tea.  I still have a lot to learn about Narnia, but for now it's back to the real world. 

Share this post :

1 comment:

dalesmithtx said...

There's a very, very good explanation of how the yield keyword works here: