Taming Erlang's New Slim Releases for Nitrogen

2013-06-08

Introduction

With the release of Erlang R15B02, the ability to generate what are called "slim releases" was added to the Erlang reltool application. Basically, a normal release as generated by Erlang's reltool application contains not only your application's code, but also a complete Erlang installation and all dependency applications, while a slim release does not contain the full Erlang installation.

A Quick TL;DR

Note: For a TL;DR, scroll down to the conclusion. If you want to play with a slim release version of Nitrogen, check out the "slim-release" branch of my nitrogen fork. Then you can run make slim_backend where backend is any of yaws, mochiweb, cowboy, etc..

Before Slim Releases...

While converting Nitrogen for what would eventually become the 2.1.0 release, Rusty, the original creator of Nitrogen, knowing that a user probably doesn't want to add an entire Erlang installation into their source control system added a site/ directory to the root of a generated release where site assets would go (page code, templates, static resources like images and css, etc), and from there instructed users to just add that site directory to source control.

For the most part, this works, but it does fall down if you also wish to include configuration files (like those found in etc/), or if you wished to be able to do a simple git clone of your application and be ready to go.

The solution has been to either manually build an Erlang application and specify Nitrogen and its dependencies as rebar dependencies, or do some tricks with symlinks (making lib/, erts/, etc into symlinks) which get included in your source control. This works well, and honestly, has been how I've worked with them for a while. It's admittedly hacky, and a little fragile, but it works.

A full release complete with ERTS ("Erlang Run Time System") is great for a downloadable archive for the website for folks without Erlang to just download Nitrogen and get started, but for anyone else, it was a mess.

Thankfully, the Erlang team decided to add slim releases, eliminating the need to include an entire ERTS installation into each generated release.

Theory vs Practice

Theoretically speaking, this is as simple as adding {excl_lib, otp_root} to your reltool.config file, generating a release, and adding a handful of command-line flags (per documentation):

-sasl releases_dir [target-dir]/releases
-boot "[target-dir]/releases/[vsn]/[RelName]"
-boot_var RELTOOL_EXT_LIB [target-dir]/lib.

According to the documentation, this is all that should be necessary.

Unfortunately for me, this was not the case*

With a little help from rebar

Conveniently, there is an as-of-yet unmerged pull request for the addition of slim releases to Basho's amazing rebar tool used basically by everyone who does Erlang development, which definitely did help me out.

So in order to begin working with this, I decided to use that unmerged pull request to build that version of rebar in the hopes that everything would work out as hoped.

The bin/nitrogen script is based on an old version of rebar's simplenode.runner, and all those command-line options were already added to the simplenode.runner script. So I copied it into Nitrogen's overlay/common/bin directory to be used that as the new Nitrogen launch script.

I also added a make slim_mochiweb rule to the Makefile and gave it a shot.

After previous attempts at building a slim release, these changes actually compiled!

"I'm almost there," I said.

I was wrong. Running bin/nitrogen console from a newly generated release was throwing a SASL undef error, which I found odd, since SASL was included in the standard installation, and it should have by default been installed, so it was a mystery to me why it was throwing SASL errors.

Attempts to fix by tweaking reltool.config

It turns out that most of the problem was just my lack of expertise in Erlang releases was burning me. I just didn't really know what all those little options in a reltool.config file do (and to an extent, I still don't).

Removing some possibly unnecessary stuff

So I started by editing the reltool.config file and simply removing the {app, sasl, ...} and {app, eunit, ...} options from it. We didn't need to tell the reltool to copy them, because the machine's full Erlang installation would already have it.

At that point, bin/nitrogen console would run, but throw an error about being unable to find the nitrogen_app file while trying to launch the nitrogen app, which was confusing, since the .beam files and .app file were right there in site/ebin.

Throwing everything and the kitchen sink at the sys/rel list

So I thought, "Hey, maybe with slim releases, we need to include the nitrogen app itself into the rel portion of sys." And then I thought "Hey, since the nitrogen app is probably needed, we probably need to include the other dependencies in here, too".

So I added nitrogen_core, simple_bridge, nprocreg, sync, nitrogen to the list of apps in sys/rel.

And also, since, the instructions in the pull request mention adding lib_dir to the sys/app entry for the main app, I decided to add the a nitrogen app entry to the sys/app and include the lib_dir entry, per the instructions.

And I tried again.

Everything compiled, and bin/nitrogen console launched without errors!

But I noticed the lack of a notification that comes with launching mochiweb saying Starting Mochiweb Server (nitrogen) on 0.0.0.0:8000, root: './site/static'. So I checked. And sure enough, there was nothing listening on localhost port 8000.

I wonder what was wrong.

So I did wf:module_info() to see if the nitrogen_core app was at least loaded, and it sure was! So what the heck was going on? Then I tried:

1> index:module_info()
** exception error: undefined function index:module_info/0

Well there it is, it's not loading any of the main application code.

But why? And if it wasn't loading any of the main application's code, why was it not throwing an error about not being able to find the nitrogen application.

The answer revealed itself when I inspected the lib directory.

$ ls lib
mimetypes      nitrogen_core            pmod_transform           sync-0.1
mimetypes-1.0  nitrogen_core-2.2.0-pre  simple_bridge
mochiweb       nprocreg                 simple_bridge-1.3.0-pre
nitrogen       nprocreg-0.2.0           sync

Oh, so including the app sys/rel means each non-OTP app will be copied and it will be given a version number in the directory. I did not realize that. And since the rebar.config in the newly generated release tells us to download and create each dependency in the lib directory here, each of those apps was essentially duplicated, but this time, without the version extension.

Also, since we included nitrogen in that list, it seemed to be insert a nitrogen entry into here as well. But our nitrogen app should be in site/ebin and not here. Inside the nitrogen directory here, was a single ebin directory, an within that, a single generated .app file, which listed no modules, or anything.

No wonder we weren't getting any errors. An essentially empty .app file is telling Erlang not to load any modules or do anything when initializing the app. That answers that problem.

So obviously, we do not want to include nitrogen in the rel/sys list.

After removing nitrogen from rel/sys and regenerating yet again, the previous error about nitrogen_app being undefined returned.

I have no memory of this place

It was here that I was truly stumped.

I felt so close. Reltool was copying the dependencies, copying the default Nitrogen code into site/src, fully compiling, and running, and yet, Erlang was throwing an initialization error for code I knew was there.

Since the dependency code was being loaded, I decided to at least tackle something that was bothering me: the duplication of directories. I preferred the dependency directories to not have their version numbers included in the directory name. That way, they can be more easily upgraded with a quick rebar update-deps without too much tinkering with things or having to worry about duplicate directories (which becomes a problem with the sync application).

So I took the nitrogen_core, simple_bridge, nprocreg, and sync out of the sys/rel application list and regenerated. This time, the lib directory was as expected:

$ ls
mimetypes  mochiweb  nitrogen_core  nprocreg  pmod_transform  simple_bridge  sync

This at least looked better. Unfortunately, when I tried wf:module_info(), it found that wf was not a valid module any longer. This meant that all the dependency applications were not being loaded.

I tried loading the modules to make sure that the paths were properly being loaded.

1> l(index).
{module,index}
2> l(wf).
{error,nofile}

Interesting. So it appears that Erlang does realize that our application's index page is actually in the path, but it's not loading it by default. Meanwhile, after removing our dependencies from sys/rel, Erlang no longer recognizes our dependency apps, even though the directories are there.

The vm.args file has a line -pa deps/*/ebin which should load dependencies, but is relying on deps instead of lib, so let's just change it to lib.

After killing Erlang and restarting it, it still did not recognize our dependencies.

The solution to this mystery came when running:

1> code:get_path().
["./lib/*/ebin","./site/ebin",
 "/usr/local/bin/../lib/erlang/lib/kernel-2.16.1/ebin",
 "/usr/local/bin/../lib/erlang/lib/stdlib-1.19.1/ebin",
 "/usr/local/bin/../lib/erlang/lib/sasl-2.3.1/ebin",
 "/usr/local/bin/../lib/erlang/lib/inets-5.9.4/ebin",
 "/usr/local/bin/../lib/erlang/lib/crypto-2.3/ebin",
 "/usr/local/bin/../lib/erlang/lib/runtime_tools-1.8.10/ebin",
 ...]

Ah, so the -pa ./lib/*/ebin argument inside the vm.args isn't actually expanded to anything and is assumed that * is an actual directory name.

So that solves that mystery. * does get evaluated when called on the erl command line, so let's just modify the nitrogen launch script to include -pa lib/*/ebin there, for the time being and see if that helps. This would just be temporary, because we don't want something so project-specific to be hard-coded.

After doing this, and trying to launch nitrogen, the scripts still weren't being loaded, but at least their paths were being added correctly and calling l(wf) actually worked to load the scripts into the VM.

Almost there, for real this time!

So, at this point, Nitrogen was able to load all the code with l(module_name), but it still wasn't able to launch anything from the start, so I began picking apart the launch script some more.

I tried making a full release, and attempted to launch with the new launch script just to see what would happen. Theoretically, the new launch script should work just as well as the old one, just have the numerous improvements that have happened to rebar's simplenode.runner over the years since Nitrogen's launch script remained stable.

It turned out, the same problems were occurring here with the full releases. But if I copied the old nitrogen script into it, it worked.

Well, isn't that interesting!?

So that means that the problem is either with actual erlexec command that's being executed or with the exported environment variables.

When comparing the exported variables before the erlexec call in each script (the new and old), the exported variables were effectively the same.

This left the only differences being in the launch script.

So I compared the erlexec lines from each, which is conveniently easy since both scripts print to the screen their respective lines.

The old (working) exec line:

$ nitrogen console
Exec: /home/gumm/code/nitrogen/rel/nitrogen/erts-5.10.1/bin/erlexec -boot \
/home/gumm/code/nitrogen/rel/nitrogen/releases/2.1.0/nitrogen -embedded  -config \
/home/gumm/code/nitrogen/rel/nitrogen/etc/app.config -config \
/home/gumm/code/nitrogen/rel/nitrogen/etc/mochiweb.config -args_file \
/home/gumm/code/nitrogen/rel/nitrogen/etc/vm.args -- console
Root: /home/gumm/code/nitrogen/rel/nitrogen

The new (non-working) exec line: (Note: this still includes the temporary -pa line added for loading libraries)

$ nitrogen console
Exec: /home/gumm/code/nitrogen/rel/nitrogen/erts-5.10.1/bin/erlexec -pa lib/*/ebin -boot \
/home/gumm/code/nitrogen/rel/nitrogen/releases/2.1.0/nitrogen -mode embedded  -config \
/home/gumm/code/nitrogen/rel/nitrogen/etc/app.config -config \
/home/gumm/code/nitrogen/rel/nitrogen/etc/mochiweb.config -args_file \
/home/gumm/code/nitrogen/rel/nitrogen/etc/vm.args -- console
Root: /home/gumm/code/nitrogen/rel/nitrogen

They're almost completely the same, but did you notice the big difference?

YUP, the new one used the flag -mode embedded, while the old one used just -embedded.

It turns out, that while the intention of the old one was to launch in embedded mode, it had the syntax wrong, and as a result, if a -mode argument isn't provided (which it isn't in the old version), it runs in interactive mode instead of embedded. After reading the docs about the difference between embedded and interactive mode, it turns out that embedded mode will only read modules it is expressly told to load using the .boot, .rel, .script files found in the releases/VERSION directory.

And since no versions of Nitrogen have ever specified the versions specifically in the reltool.config files, none were ever being loaded explicitly. But because the old script was using -embedded instead of -mode embedded, Erlang was simply discarding the useless -embedded flag, and loading in interactive mode, which automatically loaded all the compiled .beam files it could find in the code paths.

What an easy solution, and I'm so dumb for missing it

So while it worked with the full releases, it was time to test this change with the slim release. After generating another slim release with the newly updated script (modified to use -mode interactive, since for this, we simply don't want to deal with the stringent requirements to use -mode embedded), we were there. Nitrogen was booting, running, and doing everything properly.

UNFORTUNATELY, there was one more thing we needed to do before I would be happy with this working properly.

One Last Change

Remember how we hardcoded that extra -pa argument into the new launcher script? Yeah, that needs to go, or at least be inserted into the vm.args file in a way that works.

I considered scanning the vm.args file for any instances containing -pa something/*/somthing and expanding it in the new nitrogen script, but that seemed like too much of a hassle. There had to be a better way!

The better way finally revealed itself while reading more in the code server docs. The ERL_LIBS environment variable seemed promising.

In order to ensure I wasn't crippling myself in some unforseen way, I launched nitrogen without the extra -pa lib/*/ebin line in the launcher script, and did a quick:

> os:getenv("ERL_LIBS").
false

Ah, ha! It's not defined. So let's replace that old -pa ./lib/*/ebin in the vm.args file with -env ERL_LIBS ./lib and see what happens.

BINGO!

We got it.

From here, it was smooth sailing. Cleaning up things, making a simple script that can "slim-down" a reltool.config file for us, and making the normal changes to the nitrogen script (like making it load all .config files in etc instead of loading just app.config), this way we don't have to have a million slightly different reltool.config files littering the rel directory.

Conclusion

This whole process helped to illuminate just how rudimentary my knowledge of Erlang releases is and was. I have a lot more to learn about in order to tame these mysterious beasts, but I feel like I'm starting to get them a bit better.

I assembled this little story from memory, and I know I missed some things, but frankly, it's already way too long. Ultimately, it was about 3 days of tinkering with command-line options, exploring the code server, changing environment variables, and lots of recompiling and regenerating things. I even briefly dove into the Erlang/OTP code to see if anything stood out that might help me.

This was one of those projects where I thought "Hey, this should take a few hours and I'll be ready", which ended up taking so much longer simply due to my lack of knowledge.

In the end, the only changes that needed to happen were the following:

  • Use the pull request version of rebar and simplenode.runner for Nitrogen
  • Add {excl_lib, otp_root} to reltool.config in the sys section,
  • Change copying nodetool into the erts directory to the releases/VERSION directory (from the rebar overlay)
  • Don't copy erl script at all in the rebar overlay.
  • Modify the nitrogen launcher (simplenode.runner) to handle the multiple configs in etc/
  • Modify the nitrogen launcher to use -mode interactive
  • Change -pa deps/*/ebin to -env ERL_LIBS ./lib in vm.args

These changes can literally be done in minutes, when you know exactly what changes to do. Unfortunately, the newness (and experimental nature) of this slim releases feature means there are very few documents about it (my google-fu failed me if there are indeed other instructions out there that aren't involved with the rebar pull request and associated issue). That combined with my admitted ignorance of the intricacies and nuances of the reltool application and its respective config files lead to an epic journey, which I'm happy was at last solved.

Second Conclusion

Ultimately, it was completely worth it. For me, I got to learn a lot about the release process. And for Nitrogen users, Nitrogen will, as of the upcoming 2.2.0 release, support slim releases out of the box, and as a result, users will finally be able to add an entire generated release into their source control, without worrying about adding an entire Erlang installation or screwing around with symlinks.

Try it out

If you want to try it out, just clone my nitrogen repository (which I use for experimental development stuff off mainline).

Then do make slim_backend where backend is any of the following (cowboy, yaws, mochiweb, webmachine, or `inets).

It will create a slim release for you, and you'll notice that the newly generated rel/nitrogen directory is only about 2.5 MB (if you ignore the contents of lib/ which should not be added to source control).