Automatic rebuilds

This week while working on (re)Bloggy (my static site generator and the app that powers this blog) I posted on how I dealt with rebuilding the site while making changes.

I realised this was no way to live my life and while it was enough to get started with migrating form Jekyll to (re)Bloggy (read aobut that here) it was starting to get annoying.

The approach

Given some early explorations in threaded and async rust I had a rough idea what I wanted / needed to do.

Finally it would be good to start to tidy up my ~600 LOC main.rs.

watcher

This was pretty straightforward with notify-rs, which is the same crate used rust-analyzer and cargo watch.

Taking their sample code was enough to get started:

// Automatically select the best implementation for your platform.
let mut watcher = notify::recommended_watcher(|res| {
    match res {
        Ok(event) => println!("event: {:?}", event),
        Err(e) => println!("watch error: {:?}", e),
    }
})?;

// Add a path to be watched. All files and directories at that path and
// below will be monitored for changes.
watcher.watch(Path::new("."), RecursiveMode::Recursive)?;

Ok(())

I needed to tweak this slightly with a list of paths to follow Vec<&str> (luckily they're already static definitions in the main.rs so I can pass around references).

The other thing I needed to do was to filter down to specific file changes, ideally I only want to build the site again if there's been a modification and I can ignore everything else. Here I can leverage the event.kind and some match statements until I reach ModifyKind::Data(_).

Once I had only modify events I can log the file name and trigger a rebuild of the app.

Don't you point your function at me...

As I mentioned above I wanted to start tidying up main.rs and this meant that I created a new mod watcher for this new code (same for the server). That also means I need a way to pass a callback or something to the watcher that it can call once a file has changed.

My first thought was a closure wrapping the main build function in main. This looked like this:

pub fn watch<F>(dirs: Vec<&str>, callback: F)
where F: Fn()->() {...}

And I could then invoke it like this:

watcher::watch(
  vec![PAGES, POSTS...],
  || build()
);

Which works completely fine when I was testing the callback in the watch function but when I tried to add it to the notify::recommended_watcher I ran into problems.

'F' cannot be sent between threads safely

What? Why? Who is creating a thread.. I guess I'm not but behind the scenes that must be what notify is up to (and checking FsEventWatcher in the notify codebase confirms as much).

I've ran into this previously and sovled it with a number of Arc / Box etc to make rust happy that I'm #ThreadSafe.

However for my automatic rebuilds I wanted to avoid that complexity as I'm only ever going to call the same funciton (for now). To get around the thread safety issue I can instead remove the closure || build() and instead pass a function pointer fn() -> () e.g.

watcher::watch(
  vec![PAGES, POSTS...],
  build
);

This works! I'm still getting my head around threading and the gotchas (coming from a non-CS, mostly, self-taught JavaScript background) but passing around functions is something I'm used to so I'll roll with it. (Though I suspect I have some homework to do.)

You can't park there mate

In testing this all worked great and I can see the page being rebuilt (and the dist directory being emptied) each time I changed something on the site. So far so good.

But sometimes I would run the app and it would end, not watching no nothing. I guess the watcher is being triggered from the main thread and if the main thread ends so does the watcher. To get around this I threw in a loop {} after starting the notify watcher and this solved the issue - the thread wouldn't end buuut it would also use 100% CPU... not ideal.

The good news is that I was planning on having a webserver running on the main thread so I could setup the watcher and then when the webserver was listening for.

Unfortunately that doesn't work either...

There's a couple of things at play here. First off, nothing here is async and everything is in the same thread. This means that at any point if the thread is blocked no work can continue. This is an issue due to the server.recv that I was using in my server. Let's take a closer look at that before we continue.

server

The main aim, as ever, was to get something functional without too much of a headache. The webserver is for local dev only and I won't be runnig it in production as this is a static site. I like tiny_http for a webserver, sure there are others (and I really should roll my own one day, yanno, for fun) but this does the job.

Once again we can borrow the tiny_http tutorial to get something that will start listening to requests:

use tiny_http::{Server, Response};

let server = Server::http("0.0.0.0:8000").unwrap();

for request in server.incoming_requests() {
    println!("received request! method: {:?}, url: {:?}, headers: {:?}",
        request.method(),
        request.url(),
        request.headers()
    );

    let response = Response::from_string("hello world");
    request.respond(response);
}

This looks good and works well, we can build out the file path from the URL + the destination location:

let url = if rq.url().ends_with("/") {
    format!("{}{}index.html", destination, rq.url().to_string())
} else {
    format!("{}{}", destination, rq.url().to_string())
};

We do have to check whether there's a file extension as by default we link to "indexes" so something like /2023/02/01/post-title as opposed to /2023/02/01/post-title.html (or the actual path /2023/02/01/post-title/index.html). But not in all cases as there's a few file paths which are .xml and of course the .tff, .css etc.

The final form of the "server" is slightly different as I need to do some additional work to set the Content-Type headers as appropriate.

Mistakes and workarounds

Knowing that both the watcher and server both work separately we can now beging combining them. As I mentioned above there's the an issue with the way we're listening to requests.

loop {
    let rq = match server.recv() {
        Ok(rq) => rq,
        Err(_) => break,
    };

    //... 
}

The basic loop listens to requests via server.recv, however this is a blocking call that will prevent further work from happening on the thread while it's listening.

I missed this though and couldn't understand when I tried to run both these new features together. Either:

server::start(destination);
watcher::watch(dirs, callback);

or

watcher::watch(dirs, callback);
server::start(destination);

would give me a file system watcher or a server but not both.

In writing this blog post I actually understand where I went wrong here and can probably rearchitect what I've done but I'm actually fairly happy with the solution I came up with.

Workarounds

I had several ideas, first was an async runtime, second was a sepearate app for the webserver, third was a subprocess (calling node-http-server) and finally I could bung the new features in separate threads.

I tried all of them to varying degrees of success and settled on a combination separating the webserver and a new thread. As the webserver is largely disconnected from the main building of the app it makes sense that it could just live in its own thread. Additionally If I move the watcher I might have to worry about passing messages between the threads to trigger rebuilds etc.

I still had one problem though, after starting the server thread and watchers my app would still end. I needed to "wait" for the new server events but I wasn't giving it anything to do so it ends. I tried, again, the naive empty loop {} but, again, noticing the CPU usage I opted for thread::park - essentially I could intentionally block the thread and this would allow the watcher continue to work.

Automatic rebuilds - but only sometimes

The final thing to do was to add a new argument to the app so that I could do something like cargo run -- watch and it would know to start the webserver and file watchers. As an opt-in it meant the build scripts (Github actions) would continue to work without interruption.

The output of this looks like:

(re)Bloggy v0.2.0

Building...
  Completed in 258.300959ms

[watcher] Starting
[watcher] watching: _pages, _posts, _layouts, _includes, _config.yml
[server] listening on port 8080

File modified "2023-02-11-automatic-rebuilds.md"
Building...
  Completed in 259.064625ms

File modified "2023-02-11-automatic-rebuilds.md"
Building...
  Completed in 289.917375ms

I really like this too as it keeps me in my editor writing away and allows me to iterate much quicker now on the site.

Gotchas and improvements

Some things I haven't thought about:

Some things I am thinking about:

I really like how flexible jekyll is when it comes to structuring a the static site - basically anything goes and that means there's not a prescribed set of folders that are needed. So while (re)Bloggy right now has _posts, _layouts etc, there's a world where it's a lot more flexible.

This was fun though and I really enjoyed playing around with "automatic buiding"... though, what about hot-reloading of the core rust code?



First appeared on Trusty Interior, last update 30 Oct 2024