Source code odyssey - Rake
Recently I have a chance to work on mass among of rake tasks in the code base. During the work I found Rake is somewhat confusing but also an interesting framework. I would like to talk about some of the good and bad practices I found in Rake.
History and purpose of rake
Rake was originally developed By Jim Weirich, who passed away in 2014 (you can check his last commit here), and it is the major task runner for all ruby projects. It inherits some taste of the build tool “make” in the way of syntax, like the “file” task which mainly used for compiling but not used often in ruby project. There are some legacy and practices that can only be found in the early stage of ruby project and a implicit DSL approach which sometimes confusing.
Before we talk about them, let’s start from how Rake invokes the tasks. You can open the source code of Rake for details of the source code and references.
How Rake load when you call it
When we call
rake in the command line, it starts by parsing the params and application name. Then load the Rakefile under the current directory and
*.rake task files from library directory, that is why the tasks we put under
lib/tasks/ in Rails will be available in rake. And then run the tasks by the param or run the top level tasks. ex:
rake db:migrate db:test:prepare will push those tasks into the queue and invoke them.
But how do Rake find the task we want to invoke? That responsibility goes to the task manager:
Task manager is included by the application, which is accessible from
Rake.application singleton instance. Basically, all the tasks is available from
Rake.application.tasks you can invoke it directly in the test. So actually Rake task is pretty testable. The
enhance call append the task block into one of the lambdas in
task#actions that will be called when we invoke the task.
Task, we also have
Scope concept in Rake, we can revisit them later.
DSL for task
Now we understand how a task is loaded in Rake, but it is different than the Rakefile or
*.rake file we usually see. That is because we define those tasks by the Rake DSL:
At the end of DSL block, it extends the syntax to the top level, which is one of the problems of
Rake because it pollutes the namespace implicitly. However, the DSL basically forward the call to create tasks and namespaces into the
How Task, params, and prerequisite works
A rake task holds a list of prerequisites, actions to execute and the scope of the task. We can invoke the task by calling
Rake::Task[:task_name].invoke(params) (which is confusing)
The process of invoking a task is:
- load arguments
- create invocation chain to log the errors when failed
- mark the task already run
- invoke prerequisties tasks with invocation chain
- execute actions with arguments
It is pretty straight forward, isn’t it? But one of the confusing part for me is the DSL syntax for the arguments and prerequisties.
Usually the DSL is
task :task_name, when we want to pass the prerequisites, we pass it as the last argument in hash form:
task task_name: :prerequisite but it become further complicated after we introduce task arguments:
task task_name, [:arg1, :arg2] => :prerequisite. This is pretty confusing that I don’t understand why you have to design in this way? Last’s check the source:
Basically, we check the arguments includes a Hash or not, if hash exists, we extract the hash as dependencies and hash key as task name or argument names. In this way, we don’t have to specify the type of args but can depend on hash exist or not for prerequisite. If we don’t use this, the task api will be like this:
task :task_name, nil, [:dep1, :dep2] or
task :task_name, dependencies: [:dep1]. I feel this is not concise but more readable than implicit hash.
Other stuffs: Null object pattern, LinkedList, Scope and File task
One of the interesting patterns in Rake is the use or Null object pattern, there are
EmptyInvocationChain used in the code base to detect the nil and empty values. In this way, it is better than nil check because nil might represent multiple conditions and with an object, it is less likely to blow up the code if the object/argument is empty.
Another is the use of LinkedList, it has it’s own LinkedList implementation for Scope and InvocationChain, although I think the array, hash and set now can fulfill all the performance requirements for them but it is interesting that it used custom data structure.
The file task is a kind of task that implements the timestamp comparison in
needed? call. Which compare the file timestamp in all dependencies is updated, if not it will recompile the task to keep files up to date:
The usage is often like this:
file "index.html" => "index.md" do
It will check all the filetask in the prerequisite list and check the timestamp is updated or not.
When you don’t have a “file” task defined in the prerequisite, it will automatically define one in lookup to track the timestamp:
After spending some time the source code of Rake, it is actually a pretty simple and minimalist framework for running the tasks, and it is surprisingly testable. However it is an old framework, it inherits some bad practices like polluting global namespace, overwrite operator, creates helper method everywhere and use a hash to decide the type of arguments, but other than those, it is still good task runner that get the job done well.
There are replacement frameworks like Thor which solves those problems, but I still recommend the
Rake because it fulfills most of the use cases and is the de-facto standard for ruby projects, also as testable as
Thor. Unless you want to use the template generator syntax provided by
Thor or want to invoke the task method in codebase.