Source code odyssey - Rake

Why?

Recently I have a chance to work on mass among of rake tasks in the code base. During the work I found Rake is somewhat confusing but also an interesting framework. So I would like to talk about some of the good and bad practices in Rake.

History and purpose of rake

Rake was originally developed By Jim Weirich, who passed away in 2014 (you can check his last commit here), but later become the major task runner for all ruby projects. Because of that, it inherits some taste of make in the way of syntax, and the file task which mainly used for compiling but usually not used often in ruby project. Therefore, there are some legacy and practice that can only find in the early stage of ruby project and a more implicit DSL approach which sometimes confusing. Before we talk about them, let’s start from how the Rake invokes the tasks. You can open the source code of Rake for details of the source code and reference.

How Rake load when you call it

1
2
3
4
5
6
7
8
9
10
11
12
# lib/rake/application.rb
module Rake
class Application
def run(argv = ARGV)
standard_exception_handling do
init "rake", argv
load_rakefile
top_level
end
end
end
end

When we call rake in the command line, it starts by parsing the params and application name. Then load the Rakefile under the current directory and *.rake task files from library directory, that is why the tasks we put under lib/tasks/ in Rails will be available in rake. And then run the tasks by the param or run the top level tasks. ex: rake db:migrate db:test:prepare will push those tasks into the queue and invoke them.

But how do Rake find the task we want to invoke? That responsibility goes to the task manager:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#lib/rake/task_manager.rb
module Rake
module TaskManager
def define_task(task_class, *args, &block)
task_name, arg_names, deps = resolve_args(args)

...

task_name = task_class.scope_name(@scope, task_name)
...
task = find_or_create_task_by_class_and_name(task_class, task_name)
...
task.enhance(deps, &block)
...
end

def find_or_create_task_by_class_and_name(task_class, task_name)
@tasks[task_name.to_s] ||= task_class.new(task_name, self)
end
end
end

Task manager is included by the application, which is accessible from Rake.application singleton instance. Basically, all the tasks is available from Rake.application.tasks you can invoke it directly in the test. So actually Rake task is pretty testable. The enhance call append the task block into one of the lambdas in task#actions that will be called when we invoke the task.

Other than Task, we also have Rule, FileTask and Scope concept in Rake, we can revisit them later.

DSL for task

Now we understand how a task is loaded in Rake, but it is different than the Rakefile or *.rake file we usually see. That is because we define those tasks by the Rake DSL:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#lib/rake/dsl_definition.rb
module Rake
module DSL
def task(*args, &block) # :doc:
Rake.application.define_task(self, *args, &block)
end

def desc
Rake.application.last_description = description
end

def namespace(name=nil, &block)
...
Rake.application.in_namespace(name, &block)
end

...
end
end

self.extend Rake::DSL

At the end of DSL block, it extends the syntax to the top level, which is one of the problems of Rake because it pollutes the namespace implicitly. However, the DSL basically forward the call to create tasks and namespaces into the Rake.application instance.

How Task, params, and prerequisite works

A rake task holds a list of prerequisites, actions to execute and the scope of the task. We can invoke the task by calling Rake.aplication.tasks[:task_name].invoke(params) or Rake::Task[:task_name].invoke(params) (which is confusing)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# lib/rake/task.rb
module Rake
class Task
def invoke(*args)
task_args = TaskArguments.new(arg_names, args)
invoke_with_call_chain(task_args, InvocationChain::EMPTY)
end

def invoke_with_call_chain(task_args, invocation_chain)
new_chain = Rake::InvocationChain.append(self, invocation_chain)

return if @already_invoked
@already_invoked = true

invoke_prerequisites(task_args, new_chain)
execute(task_args)
end

def invoke_prerequisties(task_args, invocation_chain)
prerequisite_tasks.each { |p|
prereq_args = task_args.new_scope(p.arg_names)
p.invoke_with_call_chain(prereq_args, invocation_chain)
}
end

def execute(args)
@actions.each { |act| act.call(self, args) }
end
end
end

The process of invoking a task is:

  1. load arguments
  2. create invocation chain to log the errors when failed
  3. mark the task already run
  4. invoke prerequisties tasks with invocation chain
  5. execute actions with arguments

It is pretty straight forward, isn’t it? However one of the confusing part for me is the DSL syntax for the arguments and prerequisties.

Usually the DSL is task :task_name, when we want to pass the prerequisite, we pass it in the last argument as hash: task task_name: :prerequisite but it become so complicated after we introduce argument: task task_name, [:arg1, :arg2] => :prerequisite. This is somewhat so confusing that I don’t understand why you have to design in this way? Last’s check the source:

1
2
3
4
5
6
7
8
9
# lib/rake/task_manager.rb
def resolve_args(args)
if args.last.is_a?(Hash)
deps = args.pop
resolve_args_with_dependencies(args, deps)
else
resolve_args_without_dependencies(args)
end
end

Basically, we check the argument includes a Hash or not, if hash exists, then we extract the hash as dependencies and hash key as task name or argument names. In this way, we don’t have to specify the type of args but can depend on hash exist or not for prerequisite. If we don’t use this, the task api will be like this: task :task_name, nil, [:dep1, :dep2] or task :task_name, dependencies: [:dep1]. But I even feel this is still better than using the implicit hash.

Other stuff: Null object pattern, LinkedList, Scope and File task

One of the interesting patterns in Rake is the use or Null object pattern, there are EMPTY_TASK_ARGS, EmptyScope and EmptyInvocationChain used in the code base to detect the nil and empty values. In this way, it is better than nil check because nil might represent multiple conditions and with an object, it is less likely to blow up the code if the object/argument is empty.

Another is the use of LinkedList, it has it’s own LinkedList implementation for Scope and InvocationChain, although I think the array, hash and set now can fulfill all the performance requirements for them but it is interesting that it used custom data structure.

The file task is a kind of task that implements the timestamp comparison in needed? call. Which compare the file timestamp in all dependencies is updated, if not it will recompile the task to keep files up to date:

The usage is often like this:

1
2
3
file "index.html" => "index.md" do
generate_html("index.md")
end

It will check all the filetask in the prerequisite list and check the timestamp is updated or not.

1
2
3
4
5
6
7
8
9
10
11
# lib/rake/file_task.rb
def out_of_date?(stamp)
all_prerequisite_tasks.any? { |prereq|
prereq_task = application[prereq, @scope]
if prereq_task.instance_of?(Rake::FileTask)
prereq_task.timestamp > stamp || @application.options.build_all
else
true
end
}
end

When you don’t have a file task defined in the prerequisite, it will automatically define one in lookup to track the timestamp:

1
2
3
4
5
6
7
8
9
10
11
12
13
# lib/rake/task_manager.rb
def [](task_name, scopes=nil)
task_name = task_name.to_s
self.lookup(task_name, scopes) or
enhance_with_matching_rule(task_name) or
synthesize_file_task(task_name) or
fail generate_message_for_undefined_task(task_name)
end

def synthesize_file_task(task_name) # :nodoc:
return nil unless File.exist?(task_name) # check file exist and create file task
define_task(Rake::FileTask, task_name)
end

Conclusion

After spending some time the source code of Rake, it is actually a pretty simple and minimalist framework for running the tasks. And it is surprisingly testable. However because it is an old framework, it did inherit some bad practices like polluting global namespace, overwrite operator, creates helper method everywhere and use a hash to decide the type of arguments. However other than those, it is a good task runner for most of the things we need.

There is some other replacement like Thor that solves those problems, but I still recommend the Rake because it fulfills most of the use cases and more widely used, also pretty much testable as Thor. Unless you want to use the generator provided by Thor or want to also invoke the task in codebase.

Comments