Soupault 1.13.0 release

Estimated reading time: 5 minutes.

Date:

Soupault 1.13 release is available for download. This release solves two long-standing problems with the plugin API: lack of an easy way to generate a custom table of contents, and lack of a way to pass arbitrary values to plugins from the config file.

HTML-compliant tables of contents and the headings tree data structure

Recently I noticed that HTML standard still doesn’t allow an <ul> or <ol> element to have another <ul> or <ol> inside. They can only have <li> elements for children, and nested lists should be inside those <li>’s.

Then I noticed that a lot of tables of contents (and nested lists in general) around the web are non-compliant as well. And then I realized tables of contents generated by soupault’s toc widget are also non-compliant!

Correctly nested lists are quite hard to produce if you work with a flat list of headings, which may be the reason why so many ToC’s use invalid HTML. It’s easy when you have a tree of the document headings—in fact, a lot of things are easier when you have a tree rather than a flat list.

So, I made up a general algorithm for converting a flat list of headings to a tree. It’s a somewhat non-trivial task, in essense it’s a parser that you can’t use existing parser libraries for because the tokens aren’t strings. Still, it works now and you finally can have standards-compliant nested lists in your ToC.

Since the change of the HTML layout may break someone’s CSS styling, the new behaviour is disabled by default, and can be enabled with a valid_html = true option inside the toc widget config:

[widgets.table-of-contents]
  widget = "toc"
  valid_html = true
  ...

Why bother at all? It’s not just to keep HTML validators happy. HTML-compliant nesting in fact has advantages for automated rewriting since you don’t need to track nesting depth by hand. For example, if a document has a huge ToC, wouldn’t it be cool to display it as a tree with collapsible sections?

I wrote a plugin that can convert any nested list to a tree, using the HTML5 <details> elements. That plugin will work with any correctly nested list, whether it’s generated by the toc widget or anything else.

You can see it in action in my infamous iproute2 user guide. For a long time it has been using JavaScript for the expand/collapse sections functionality. Not anymore! Now every section can be collapsed or expanded individually.

Collapsible ToC section

If anyone is interested in built-in foldable ToC functionality, that’s also doable and can be added to a future release.

Accessing the ToC tree from Lua

But wait, there’s more! The headings tree isn’t just an internal thing, it’s accessible to plugins so that everyone can render a ToC as they want.

I’ve seen people request ToC data structure access from maintainers of other site generators. Many developers aren’t in position to do it because they don’t even have access to it themselves: often it’s an opaque feature of a library that converts Markdown to HTML. However, I believe it’s a nice thing to have and for me it was much easier to do, so here we are.

It’s done with a new HTML.get_headings_tree function. It actually returns a list of trees, since in HTML there’s no mandatory document title element that could serve as a natural root for the entire tree.

This is what the data structure looks like:

[
  {
    "heading": "<h1>Chapter one</h1>",
    "children": [
      {"heading": "<h2>Section one</h2>", "children": []}
    ]
  },
  {"heading": "<h1>Chapter two</h1>", "children": []}
]

The values of heading fields are HTML element tree references of course, not strings. There’s also a HTML.get_heading_level function that returns the level of a heading for elements like <h1> etc., or zero if an element isn’t a heading.

Here’s sample code that just prints document headings to the build log:

function print_tree(t)
  Log.warning(format("Level %d heading: %s",
    HTML.get_heading_level(t["heading"]),  tostring(t["heading"])))
  
  if not t["children"] then
    -- Leaf node was reached
    return nil
  end

  -- Else iterate over subsections
  local n = 1
  local count = size(t["children"])

  while (n <= count) do
    print_tree(t["children"][n])
    n = n + 1
  end
end

ht = HTML.get_headings_tree(page)

i = 1
s = size(ht)
while (i <= s) do
  print_tree(ht[i])
  i = i + 1
end

Passing any values to plugins from the config

TOML is dynamically typed, and so is Lua. Until recently, the fact that there’s a statically typed language in between was a bit too apparent: you could only pass strings to plugins through the config. Then those limitations got relaxed as I worked out more of the necessary impedance matching.

Now that work is almost complete. Finally you can pass embedded tables to plugins. This is now a valid and workable configuration:

[widgets.my-funny-widget]
  widget = "my-plugin"
  options = {
    should_work = true,
    greeting = "hello world",
    lucky_numbers = [3, 7, 1]
  }

Here’s what my-plugin.lua can look like:

if config["options"]["should_work"] then
  Log.info("Plugin should work")
end

if config["options"]["greeting"] then
  Log.info(format("It may greet you with \"%s\"", config["options"]["greeting"]))
end

i = 1
while (i <= size(config["options"]["lucky_numbers"])) do
  Log.info(format("%d is a lucky number", config["options"]["lucky_numbers"][i]))
  i = i + 1
end

As you can see, values are passed from the TOML config to the Lua script transparently. If something was a number or a boolean, it will be passed as one, without unnecessary conversion to strings that early soupault versions did.

Some things that still don’t work are heterogenous lists and lists of embedded tables. It’s a limitation of the to.ml library that I should also fix some day.

Other improvements

The title widget now has a force option. When it’s true, the widget will create a <title> element even if it wasn’t in the page. This is useful in HTML processor mode, if you want to improve consistency of a bunch of handwritten pages.

Example:

[widgets.page-title]
  widget = "title"
  selector = ["h1", "#title", "#post-title"]
  default = "soupault"
  append = " &mdash; soupault"
  force = true

Custom fields now allow lists of selectors in the selector option, just like widgets.

Example:

[index.custom_fields]
  reading_time = {
    selector = ["span#reading-time", ".reading-time"]
  }

There’s now String.to_number function for converting strings to number.

Empty site_dir option is not allowed anymore. Originally it was interpreted as “look for pages in the current dir”.

Towards soupault 2.0

In a sense, this is an anniversary release. I released the first public beta on July the 15th past year—at the point when it could replace a bunch of custom scripts for building my own website.

Some ideas did pass the test of time, but others clearly didn’t. I quite intentionally named the first release 1.0.0 and maintained compatibility with it since then.

I’m not fond of software stays at 0.x.x and keeps making incompatible changes despite knowingly being used in production by people other than maintainers. We all make wrong assumptions though, and now it may be time to fix some of those design issues. Unfortunately some of those may require breaking changes, so there’s going to be a 2.0.0 release—though I’ll try to make the migration as easy for as possible.

Some things I clearly would do in a very different way right now. The breadcrumbs widget is a good example. I might not even think of a built-in breadcrumbs widget now that it can be done with a Lua plugin, but at the time it wasn’t clear if extensibility was possible at all. However, the breadcrumb_template option is also an artifact of my idea of not using a template processor for anything, and the way it works is confusing and limiting.

So, here are some ideas for the 2.0 release.

Switching to Jingoo for templates

Soupault started as a response to static site generators that rely on Turing-complete, overgrown template processing languages as their main or even the only extensibility mechanism. HTML as a first class citizen and “DOM without a browser” are still the key ideas, but for some tasks, template processors really work best—to the point that people use them in client-side JavaScript code sometimes.

Right now soupault uses Mustache templates for the built-in index page generators. Ability to call a custom index metadata processor and include its output in a page is a key feature that isn’t going anywhere. However, right now it’s a compromise between using a simplistic built-in and adding a whole bunch of moving parts to the site build process.

Switching to a template processor that supports logic and filters can help people create reasonably powerful setups without additional moving parts. Being an SSG that doesn’t break is a goal of soupault—that is why I make statically linked executables in the first place.

So far Jingoo looks like a good candidate. It’s mostly syntax-compatible with popular Jinja2 so it should be easy to learn. It also should be compatible with the basic syntax of Mustache to allow reuse of the old index templates.

Apart from the index generators, it may also be used in other widgets, e.g. for the breadcrumb templates. I also hope I can make it possible to use from within Lua plugins, but whether it’s practical remains to be seen. I’m definitely going to experiment with it.

Index fields

Originally I’ve added a bunch of options for most common metadata: title, date, author, and excerpt. Just so that soupault could be used for simple blogs, and to free webmasters from having to maintain lists of pages by hand.

Then “static microformat” idea turned out more fruitful than I thought, and I’ve added support for custom fields. The irony is that custom fields are much more functional than built-ins by now: you can choose whether to extract element content or attribute and whether to remove HTML tags from that content or not. You cannot do that for built-in fields easily, but you also cannot replace built-in fields with custom fields for the purpose of automatic sorting.

For 2.0.0, I have an idea to deprecate built-in fields altogether. The [index.custom_fields] table can be renamed to just [index.fields].

Misc ideas

Actual transition process

Exact plan still isn’t ready. If you have any ideas what else can be improved at cost of breaking compatibility, or if you have any feedback regarding the ideas I stated here, feel free to write to the mailing list.

I’m planning to make a convertor for configs to help people migrate old deprecated options to the new format, too.