I wish I had a clip reel I could roll after a dude with a really deep voice said, "Previously on Parsing YAML files in Ruby". But I don't. So here's a link.
Ruby is Narnia. I spend my real life in c#, a perfectly serviceable language. I'm comfortable there, I kind of know my way around, and I've come to depend on it to make my living. But when I have a few spare moments here and there, I get to wander off into this magical fairy-land and have adventures with strange and wonderful creatures. Like the YAML. In case you forgot what our YAML looks like, here he is:
1: ---
2: shared paths:
3: build share : \\buildshare.mydomain.com\Builds
4:
5: local paths:
6: references : \references
7:
8: custom assemblies:
9: - location: \Dev\Components\Business\Core\Trunk\Latest\Debug
10: assemblies:
11: - name : MyNamespace.Core
12: files:
13: - binary : MyNamespace.Core.dll
14: - debug : MyNamespace.Core.pdb
15: - document : MyNamespace.Core.xml
16:
17: - location: \Dev\Components\Framework\Trunk\Debug
18: assemblies:
19: - name : MyNamespace.Framework.Core
20: files:
21: - binary : MyNamespace.Framework.Core.dll
22: - debug : MyNamespace.Framework.Core.pdb
23: - document : MyNamespace.Framework.Core.xml
24:
25: vendor assemblies:
26: - location: \vendor\DotNet Commons\Logging\2.0
27: assemblies:
28: - name : Dotnet.Commons.Logging
29: files:
30: - binary : Dotnet.Commons.Logging.dll
31:
32: testing assemblies:
33: - location: \vendor\Nunit\2.4.3
34: assemblies:
35: - name : NUnit.Framework
36: files:
37: - binary : nunit.framework.dll
38:
39: - location: \vendor\Rhino.Mocks\3.3.0.906
40: assemblies:
41: - name : Rhino.Mocks
42: files:
43: - binary : Rhino.Mocks.dll
44: - document : Rhino.Mocks.xml
45: ...
Last time I told you how easy it was to access data in *.yml files in Ruby. I've taken idea that a little further, and I cooked up this class:
1: require 'yaml'
2:
3: class References
4: attr_accessor :debug_mode
5: def initialize(references_file_name='references.yml',debug_mode=true)
6: @refs = open(references_file_name) {|f| YAML.load(f) }
7: @debug_mode = debug_mode
8: end
9: def shared_root_directory
10: @refs['shared paths']['build share']
11: end
12: def get_filenames(assembly_list_name, *file_types)
13: get_node(@refs, assembly_list_name) do |assembly_list|
14: assembly_list.each do |packing_list|
15: get_node(packing_list, 'assemblies'){|assembly| get_names(assembly, packing_list['location'],file_types){|filename| yield filename}}
16: end
17: end
18: end
19: private
20: def concatenate(*locators)
21: concatenated = String.new
22: locators.each { |locator| concatenated << (locator =~ /\A(?!\\)/ ? '\\' : '') << locator.sub(/\\\Z/, '') }
23: return concatenated
24: end
25: def get_names(assembly, path,file_types)
26: get_node(assembly, 'files') do |file|
27: parse_filenames(file,file_types){|filename| yield concatenate(shared_root_directory,path,filename)}
28: end
29: end
30: def get_node(data_store, find_key)
31: yield data_store[find_key] if data_store.kind_of? Hash
32: data_store.each{|node| get_node(node, find_key){|subnode| yield subnode}} if data_store.kind_of? Array
33: end
34: def parse_filenames(file_node,file_types)
35: file_node.keys.each {|key| yield file_node[key] unless filter(key,file_types)} if file_node.kind_of? Hash
36: file_node.each{|element| parse_filenames(element,file_types){|value| yield value}} if file_node.kind_of? Array
37: end
38: def filter(key,file_types)
39: (!@debug_mode && key=="debug") || (!file_types.include?(key) unless file_types.empty?)
40: end
41: end
With this References class, you can do something like this (pay attention - here's where it starts to get cool):
refs = References.new
refs.get_filenames("custom assemblies","binary"){|filename| puts filename}
And you get something like this:
\\buildshare.mydomain.com\Builds\Dev\Components\Business\Core\Trunk\Latest\Debug\MyNamespace.Core.dll
\\buildshare.mydomain.com\Builds\Dev\Components\Framework\Trunk\Debug\MyNamespace.Framework.Core.dll
Let's start with get_node() on line 30. This method is an iterator. I don't know why, but it took a long time for the lightbulb to go off in my head over Ruby's usage of the yield keyword. Turns out, it works just like all the Ruby books say it does. Really, why would they lie? In this case, on line 31, we're getting the value located in an element in the data_store hash picked out by the find_key variable, and yielding that value back to the calling method. And that calling method better have a code block to execute once it receives a value, or we're gonna get a big ol' runtime exception. For the get_filenames() call in our script, on line 14 we're saying, "Look in the top-most hash in the references.yml file and find me a node with a key called 'custom assemblies'". Remember: the way our YAML file is laid out, it's just a big, weird hash of arrays and hashes. We have to write code to ferret out the info we want, and in this case ultimately we want a list of filenames.
There's another interesting thing happening on line 31. There's an if statement at the end of the line. If you tried to get away with something like that in c# land, they'd lock you up and throw away the key. Ahh, but here in Narnia, animals talk, trees walk, and all sorts of silly things happen. You can even say "hey, do this thing if this other thing is true", they way people do. No fussy brackets, or parentheses, or overly strict formatting rules to worry about. Line 31 takes care of the case when data_store is a hash. If data_store is not a hash, it's an array of hashes, and we take care of that on line 32, using a little recursion magic to get at the hash in each element of the array.
I think of the contents of 'custom assemblies' as an assembly_list, and each assembly_list contains packing_lists, each with a location and a list of assemblies at that location. Each assembly can have more than one file associated with it - in this case I've listed the binary dll, the debug symbol pdb file, and the xml document associated with our custom assemblies. I'm asking References to get just the binary files in 'custom assemblies'. Now that I've got the 'custom assemblies' node, I already know that the assembly_list is an array, so I can just iterate through it with .each to get each packing_list. (That's the reason I wrote get_node() in the first place - as I was learning about YAML and Ruby, I wasn't sure what object types I was dealing with as I drilled down through the YAML file. I could probably simplify the get_node() iterator now that I understand the structure of the file better, but I'll save refactoring for another time.) A packing_list is a hash containing a 'location' that can have multiple 'assemblies'. Line 15 says, "Get me the filenames for every assembly in this packing_list, and I'm really only want the ones that match this list of file_types".
The * in front of file_types in the get_filenames() signature makes it an optional parameter. You don't have to specify the type of file you're looking for, and if you don't then you get back everything. But if you do specify a list of file_types, that list is used in the filter() method, which is called on line 35. Another cool Ruby-ish way of saying something: using the unless keyword. Line 35 says, "Take a look at all the keys in the file_node hash, and give me back the file_node value for each key unless the key should be filtered out."
I'm hoping to put plain old Ruby classes and YAML together with Rake so that I can sweep angle brackets out of my life forever, and I'll post my progress as I learn. Now I'm sure that there are better ways of expressing these things in Ruby. But I'm new here. I'm still enjoying my Turkish Delight and hot tea. I still have a lot to learn about Narnia, but for now it's back to the real world.