The third step in flay’s process is to run the s-expressions through the #analyze
method.
The Code
1 2 3 4 5 6 7 8 9 10 |
def analyze self.prune self.hashes.each do |hash,nodes| identical[hash] = nodes[1..-1].all? { |n| n == nodes.first } masses[hash] = nodes.first.mass * nodes.size masses[hash] *= (nodes.size) if identical[hash] self.total += masses[hash] end end |
Review
Flay#analyze
is performing four steps:
- Prune the trees to remove data that isn’t needed and then for each tree:
- Track identical trees
- Track the masses of each tree
- Track the total mass of all of the trees
1. Prune the trees to remove data that isn’t needed
1 |
self.prune |
flay’s #prune
method walks all of the trees and removes ones that are too small or don’t contain any duplication. Since flay is only concerned with code duplication, throwing away data that isn’t needed is a good optimization.
2. Track identical trees
1 |
identical[hash] = nodes[1..-1].all? { |n| n == nodes.first } |
Here flay is iterating over each node in the tree and checking if the s-expression’s body is the same as the first node (the type is the first element, the body is the remainder like a Lisp cons cell). When all of the elements match, the tree/hash is tracked in the identical
data structure.
3. Track the masses of each tree
1 2 |
masses[hash] = nodes.first.mass * nodes.size masses[hash] *= (nodes.size) if identical[hash] |
Flay is doing two things here. First it’s calculating the mass of the tree by multiplying the first node’s mass by the number of nodes. Second, if the tree was identical the total mass is multiplied again by the number of nodes. I think this is done to give a larger score to identical duplication so they appear higher in the report.
4. Track the total mass of all of the trees
1 |
self.total += masses[hash] |
Finally, the total mass for the tree is added to the total
counter. This counter is used in the report for the overall score.
With this method, I’ve completed my code reading of flay. It was very educational stumble through how flay works since it’s using some advanced programming concepts and libraries I’ve never used before. If reading flay’s code was interesting, there are a few methods that I didn’t cover that would be useful to read through.
Next week I’m going to start reading the code for Capistrano and see what I can find under it’s hood.