Hot code reloading in Erlang without using an OTP release

Erlang supports change of code in a running system.

However, the details are a bit fiddly. Here’s a cheat-sheet I used recently for a simple TCP service written using Erlang.

My program was a single module, running outside of any OTP application context. The instructions here need minor emendation to either explicitly list modules to purge and reload or to discover all modules within a single application; see the places in server-reload below mentioning the atom my_server.

I did not use the -on_load() directive, because I wanted to be able to use multiple nodes rather than controlling reloads from a single node’s shell repl, and I couldn’t figure out how to make the two play nicely together.

The Erlang

I exported a code_change/0 from my module, to be called after loading a new version of the module into a node. It sends a message code_change to each “global” actor in my program (in this case, there was only one).

-export([code_change/0]).

code_change() ->
    io:format("+ code_change~n"),
    %% name registered previously with `global:register_name/2`:
    global:send(name_of_my_global_actor, code_change),
    ok.

That actor distributes the notification on to any inferior actors it is managing, and then does an “MFA” self-call to upgrade its own codebase.

index(Connected) ->
    receive
        code_change ->
            [P ! code_change || {_Peer, P} <- Connected],
            ?MODULE:index(Connected);
        ...
    end.

Similarly, all other notified actors perform “MFA” self-calls.

connection(Sock, Username, IndexPid) ->
    receive
        code_change ->
            ?MODULE:connection(Sock, Username, IndexPid);
        ...
    end.

Actors need to take care to manage upgrades of their state at the same time as they do the “MFA” self-calls.

Starting the program

I wanted it to be run by daemontools, so created the following shell script called run, which daemontools will pick up to start a service:

#!/bin/sh
set -e
erlc -o ebin my_server.erl
exec erl \
     -noshell \
     -pa ebin \
     -sname mainnode \
     -setcookie f98b3a1e-80ec-11ef-b752-0b638e4de31c \
     -s my_server

Pick a fresh random cookie for the -setcookie argument. I used uuid(1).

Then, I created this script, server-reload:

#!/bin/sh
set -e
erlc -o ebin my_server.erl
exec erl \
     -noshell \
     -pa ebin \
     -setcookie f98b3a1e-80ec-11ef-b752-0b638e4de31c \
     -sname undefined \
     -eval "
           ServerNode = mainnode@$(hostname -s),
           io:format(\"ServerNode: ~p~n\", [ServerNode]),
           true = net_kernel:connect_node(ServerNode),
           spawn(ServerNode, fun () ->
               code:purge(my_server),
               code:load_file(my_server),
               ok = my_server:code_change()
           end),
           init:stop()"

Running server-reload causes the source code to be compiled and hot-loaded into the running server.

Grace notes

Then, I used a git post-receive hook to automatically recompile and reload the code on push to live:

#!/bin/sh
set -e
unset GIT_DIR
cd $HOME/location-of-checkout-of-server-repository
git pull --ff-only
./server-reload

That’s it

That’s all. The end result worked well: I used it to run a hotfix to my TCP service with many tens of live, active connections, and not one of them noticed a thing.