Soupault 1.9.0 release

Estimated reading time: less than a minute.

Date:

Soupault 1.9.0 is available for download or installation from the opam repository. It now offers a --index-only option for people who want to extract site metadata to JSON and stop at that. There are also multiple improvements in the plugin API and the preprocess_element widget support, as well as bug fixes.

Verifying release integrity

Since 1.9.0, soupault uses minisign rather than PGP for release signing. If you are new to signify/minisign, you should read signify: Securing OpenBSD From Us To You paper by Ted Unangst. There's much less overhead compared to PGP, and the keys are much shorter due to less metadata embedded in them and use of newer elliptic curve algorithms.

You can verify the releases using this key: RWRfW+gkhk/+iA7dOUtTio6G6KeJCiAEp4Zfozw7eqv2shN90+5z20Cy.

For example:

minisign -Vm soupault-1.9.0-win32.zip -P RWRfW+gkhk/+iA7dOUtTio6G6KeJCiAEp4Zfozw7eqv2shN90+5z20Cy

If you have any doubts about the authenticity of the key, feel free to contact me directly.

New features

--index-only

There's now a --index-only option that makes soupault stop at metadata extraction. It just dumps the index data to a JSON file specified in the dump_json option, but doesn't generate any pages.

It will run widgets that are supposed to run before the index extraction though, if you've configured the extract_after_widgets option (the reading time plugin on this site is a good example of why this may be needed—that widget must run before metadata extraction so that the reading time can appear in the blog index page).

There are two use cases for this. First, it may be useful for people who want to generate an index page or an RSS/Atom/JSONFeed for a handwritten website. Second, it can be a step in a TeX-like workflow. Since soupault doesn't create page files on principle, the intended way to generate a blog archive or a list of all pages is to export the metadata to JSON and run it through a script that makes pages, then run soupault again to assemble a complete website. With --index-only, you can make that process faster.

Limiting index extraction to some pages or sections

You already could limit widgets to certain pages, sections, or path regular expressions. Now you can do the same for index extraction, if you want to index just a /blog section for example.

Likewise, you can also limit index extraction to a specific build profile.

Multiple selectors for preprocess_element widgets

It's now possible to use a list of selectors with preprocess_element widgets, to avoid redundancy in the configs.

[widgets.syntax-highlight]
  widget = "preprocess_element"
  selector = ["code", "pre"]
  ...

New plugin functions

It's now possible to extend soupault with plugins, external programs, or plugins that run external programs.

Specifically, there are now Sys.run_program and Sys.get_program_output functions that you can use in your plugins. It doesn't add much more expressive power, but it can make some things easier. For example, I use it in a plugin that takes a page modification date from git unless that page has a handwritten timestamp in <time id="last-modified">.

There are also functions for easily accessing children, descendants, and siblings of an element, functions for deleting and cloning element content, and a few more convenience functions.

Bug fixes

The title widget correctly removes all HTML tags from the title string (if there are any). It also doesn't add extra whitespace anymore. Both fixes were made by Thomas Letan.

CSS selector syntax errors are now handled gracefully. That took a pull request to lambdasoup.