Source code odyssey - Rake
Recently I have a chance to work on mass among of rake tasks in the code base. During the work I found Rake is somewhat confusing but also an interesting framework. So I would like to talk about some of the good and bad practices in Rake.
History and purpose of rake
Rake was originally developed By Jim Weirich, who passed away in 2014 (you can check his last commit here), but later become the major task runner for all ruby projects. Because of that, it inherits some taste of make in the way of syntax, and the file task which mainly used for compiling but usually not used often in ruby project. Therefore, there are some legacy and practice that can only find in the early stage of ruby project and a more implicit DSL approach which sometimes confusing. Before we talk about them, let’s start from how the Rake invokes the tasks. You can open the source code of Rake for details of the source code and reference.
How Rake load when you call it
When we call
rake in the command line, it starts by parsing the params and application name. Then load the Rakefile under the current directory and
*.rake task files from library directory, that is why the tasks we put under
lib/tasks/ in Rails will be available in rake. And then run the tasks by the param or run the top level tasks. ex:
rake db:migrate db:test:prepare will push those tasks into the queue and invoke them.
But how do Rake find the task we want to invoke? That responsibility goes to the task manager:
Task manager is included by the application, which is accessible from
Rake.application singleton instance. Basically, all the tasks is available from
Rake.application.tasks you can invoke it directly in the test. So actually Rake task is pretty testable. The
enhance call append the task block into one of the lambdas in
task#actions that will be called when we invoke the task.
Task, we also have
Scope concept in Rake, we can revisit them later.
DSL for task
Now we understand how a task is loaded in Rake, but it is different than the Rakefile or
*.rake file we usually see. That is because we define those tasks by the Rake DSL:
At the end of DSL block, it extends the syntax to the top level, which is one of the problems of
Rake because it pollutes the namespace implicitly. However, the DSL basically forward the call to create tasks and namespaces into the
How Task, params, and prerequisite works
A rake task holds a list of prerequisites, actions to execute and the scope of the task. We can invoke the task by calling
Rake::Task[:task_name].invoke(params) (which is confusing)
The process of invoking a task is:
- load arguments
- create invocation chain to log the errors when failed
- mark the task already run
- invoke prerequisties tasks with invocation chain
- execute actions with arguments
It is pretty straight forward, isn’t it? However one of the confusing part for me is the DSL syntax for the arguments and prerequisties.
Usually the DSL is
task :task_name, when we want to pass the prerequisite, we pass it in the last argument as hash:
task task_name: :prerequisite but it become so complicated after we introduce argument:
task task_name, [:arg1, :arg2] => :prerequisite. This is somewhat so confusing that I don’t understand why you have to design in this way? Last’s check the source:
Basically, we check the argument includes a Hash or not, if hash exists, then we extract the hash as dependencies and hash key as task name or argument names. In this way, we don’t have to specify the type of args but can depend on hash exist or not for prerequisite. If we don’t use this, the task api will be like this:
task :task_name, nil, [:dep1, :dep2] or
task :task_name, dependencies: [:dep1]. But I even feel this is still better than using the implicit hash.
Other stuff: Null object pattern, LinkedList, Scope and File task
One of the interesting patterns in Rake is the use or Null object pattern, there are
EmptyInvocationChain used in the code base to detect the nil and empty values. In this way, it is better than nil check because nil might represent multiple conditions and with an object, it is less likely to blow up the code if the object/argument is empty.
Another is the use of LinkedList, it has it’s own LinkedList implementation for Scope and InvocationChain, although I think the array, hash and set now can fulfill all the performance requirements for them but it is interesting that it used custom data structure.
The file task is a kind of task that implements the timestamp comparison in
needed? call. Which compare the file timestamp in all dependencies is updated, if not it will recompile the task to keep files up to date:
The usage is often like this:
file "index.html" => "index.md" do
It will check all the filetask in the prerequisite list and check the timestamp is updated or not.
When you don’t have a file task defined in the prerequisite, it will automatically define one in lookup to track the timestamp:
After spending some time the source code of Rake, it is actually a pretty simple and minimalist framework for running the tasks. And it is surprisingly testable. However because it is an old framework, it did inherit some bad practices like polluting global namespace, overwrite operator, creates helper method everywhere and use a hash to decide the type of arguments. However other than those, it is a good task runner for most of the things we need.
There is some other replacement like Thor that solves those problems, but I still recommend the
Rake because it fulfills most of the use cases and more widely used, also pretty much testable as
Thor. Unless you want to use the generator provided by
Thor or want to also invoke the task in codebase.