13th May 2024, 13 min read

Example Theme for Simplified Saaze: Wendt

Original post is here eklausmeier.goip.de/blog/2024/05-13-example-theme-for-simplified-saaze-wendt.

Another theme for Simplified Saaze called "Wendt". You can inspect it here.

It offers below features:

Responsive with media breaks for large and small screens, and for printing.
Top menu with submenus.
Two column using CSS grid, "Holy Grail Layout".
Multiple blogs:
- Each category has its own blog by using filtering.
- Each author has its own blog by using filtering.
- Aggregate blog, i.e., the combination of the above.
Using the  tag to showcase the initial content of a blog post.
Sitemap in HTML and XML, RSS feed.
WebAssembly based search using pagefind.
No cookies, therefore no annoying cookie banner required.

The theme looks like this:

This theme is modeled after the blog from Alexander Wendt. That blog is powered by WordPress and hosted on Cloudflare. I have written on this PublicoMag website: Performance Remarks on PublicoMag Website. Alexander Wendt started this blog in October 2017. The number of posts per year are given in below table. Year 2024 is not complete. As time passes the year 2024 will have more and more posts.

Year	17	18	19	20	21	22	23	24
#posts	50	237	191	190	179	177	168	43
#comments	721	3999	3211	2973	2480	1300	1115	230

Number of comments were counted like this (varying 2017 to 2024):

perl -ne 'if (/^(\d+) Kommentare <\/h5>/) { $s+=$1; printf("%d\t%d\t%s\n",$1,$s,$ARGV); }' 2017*

1. Installation

There are two parts in the installation.

1. Install the theme including content and the Simplified Saaze static site generator using composer:

$ composer create-project eklausme/saaze-wendt
Creating a "eklausme/saaze-wendt" project at "./saaze-wendt"
Installing eklausme/saaze-wendt (v1.0)
  - Downloading eklausme/saaze-wendt (v1.0)
  - Installing eklausme/saaze-wendt (v1.0): Extracting archive
Created project in /tmp/T/saaze-wendt
Loading composer repositories with package information
Updating dependencies
Lock file operations: 1 install, 0 updates, 0 removals
  - Locking eklausme/saaze (v2.2)
Writing lock file
Installing dependencies from lock file (including require-dev)
Package operations: 1 install, 0 updates, 0 removals
  - Downloading eklausme/saaze (v2.2)
  - Installing eklausme/saaze (v2.2): Extracting archive
Generating optimized autoload files
No security vulnerability advisories found.
        real 3.08s
        user 0.48s
        sys 0
        swapped 0
        total space 0

2. The Simplified Saaze installation is described in Simplified Saaze. It documents how to check for PHP version, check for yaml-parsing, FFI, MD4C extension, etc.

Once everything is installed, just run php saaze -mor.

Option -m generates the XML sitemap, -o generates the HTML overview file, and -r generates the RSS.

2. Downloading all WordPress content

We need a list or URLs available.

Below approach did not work: We use the month list in WordPress.

for i in `seq 2018 2023`; do for j in `seq -w 01 12`; do curl https://www.publicomag.com/$i/$j/ > m$i-$j.html; done; done

Special cases for 2017 and 2024:

curl https://www.publicomag.com/2017/10/ -o m2017-10.html
curl https://www.publicomag.com/2017/11/ -o m2017-11.html
curl https://www.publicomag.com/2017/12/ -o m2017-12.html
...
curl https://www.publicomag.com/2024/03/ -o m2024-03.html

It turned out that the month-lists lack links. To be exact: It lacks more than 466 URLs.

This approach fetches all links:

$ curl https://www.publicomag.com/ -o wendt-p1.html
$ time ( for i in `seq 2 124`; do
    curl https://www.publicomag.com/page/$i/ -o wendt-p${i}.html;
  done )

This creates 124 files:

$ ls -alFt | head
total 25580
drwxr-xr-x 2 klm klm   4096 Apr  2 11:34 ./
drwxr-xr-x 4 klm klm   4096 Apr  2 11:33 ../
-rw-r--r-- 1 klm klm 208194 Apr  2 11:28 wendt-p1.html
-rw-r--r-- 1 klm klm 187908 Apr  2 11:27 wendt-p124.html
-rw-r--r-- 1 klm klm 203575 Apr  2 11:27 wendt-p123.html
-rw-r--r-- 1 klm klm 206497 Apr  2 11:27 wendt-p122.html
-rw-r--r-- 1 klm klm 207572 Apr  2 11:27 wendt-p121.html
-rw-r--r-- 1 klm klm 207970 Apr  2 11:27 wendt-p120.html
-rw-r--r-- 1 klm klm 206010 Apr  2 11:27 wendt-p119.html
...

List of URLs:

perl -ne 'print $1."\n" if /<h2 class="post-title"><a href="([^"]+)"/' wendt-p*.html > allURL

Downloading all posts uses below Perl script blogwendtcurl:

#!/bin/perl -W
# Download content from www.publicomag.com (Alexander Wendt) given a list of URLs
# Elmar Klausmeier, 05-Mar-2024

use strict;
my $fn;
my @F;

while (<>) {
    chomp;
    @F = split('/');
    $F[5] =~ s/a%cc%88/ä/;
    $fn = $F[3] . '-' . $F[4] . '-' . $F[5] . '.html';
    printf $fn . "\n";
    `curl $_ -o $fn`;
}

This creates a list of HTML files:

$ ls -alFt | head
total 175856
drwxr-xr-x 3 klm klm   4096 Mar  7 19:16 ../
drwxr-xr-x 2 klm klm  69632 Mar  5 19:53 ./
-rw-r--r-- 1 klm klm 203580 Mar  5 19:53 2024-03-18471.html
-rw-r--r-- 1 klm klm 252784 Mar  5 19:53 2024-03-wenn-die-zukunft-ans-fenster-des-gruenen-hauses-klopft.html
-rw-r--r-- 1 klm klm 203765 Mar  5 19:53 2024-03-zeller-der-woche-niedere-gruende.html
-rw-r--r-- 1 klm klm 203337 Mar  5 19:53 2024-02-zeller-der-woche-widerstaendler.html
-rw-r--r-- 1 klm klm 231904 Mar  5 19:52 2024-02-das-nie-wieder-deutschland-und-seine-millionen-fuer-judenhasser.html
...

3. Analyzing content types

1. Fonts.

Logo: Shadows Into Light Two, original uses image instead. Another contender could be Croissant One.
Text: Playfair Display

2. Categories. Categories over all posts are as follows:

$ perl -ne 'print $1."\n" if / hentry category-([-\w]+)/' *.html | sort | uniq -c | sort -rn
    595 spreu-weizen
    486 politik-gesellschaft
    122 medien-kritik
     28 fake-news
      3 hausbesuch
      1 film

Different, i.e., multiple, categories can be attributed to a single post. However, the majority of posts only has a single category attached.

In the above list there is no categoriy "alte-weise". I added this category.

We want to convert images in "Alte-Weise" to text. That way loading those pages should be way quicker. Therefore we need to download those images and convert them with tesseract.

3. URLs. Below Perl one-liners produces a list of URLs for the images.

perl -ne 'print "$1$2\n" if (/^<meta property="og:image"\s+content="(https:\/\/www\.publicomag\.com\/wp-content\/uploads\/\d+\/\d+\/)(Alte-Weis[^"]+|AlteWeise[^"]+|AlteuWeise[^"]+|auw-[^" ]+|aub_[^"]+|auw_[^"]+|AuW_[^"]+|AW_[^"]+|OW[^"]+)"/)' *.html | sort > ../allAlte-WeiseURL

Downloading these images:

perl -ane 'chomp; @F=split(/\//); `curl $_ -o $F[7]`' ../allAlte-WeiseURL
curl https://www.publicomag.com/wp-content/uploads/2023/01/Alte-Weise_C.Wright-Mills-1011x715.jpg -o Alte-Weise_Wright_Mills-scaled.jpg

4. JavaScript. A huge number of JavaScript libraries are loaded. We will get rid of them all.

Google Analytics
JQuery Minimal
JQuery Migrate
WordPress User Avatar
Buzzblog Hercules Likes
Borlabs Cookies Prioritize
WordPress GDPR Compliance
Comment Reply
Contact Form
JQuery Easing for Buzzblog
JQuery MagnificPopup for Buzzblog
JQuery Plugins for Buzzblog
JQuery JustifiedGallery for Buzzblog
Buzzblog Bootstrap
Owl Carussel for Buzzblog
Buzzblog AnimatedHeader
Shariff
MailPoet
Akismet
Borlabs Cookies Minimal

4. Reducing number of images

An easy target is the logo: this was replaced with plain text. This saves one roundtrip to the web-server.

1. For the category "alte-weise" the entire image with text is converted to two elements:

An image
The actual text

The image is scanned with tesseract.

That way the text can be searched via Pagefind. Also, the required bandwidth is reduced.

Old:

New:

The new approach is to use a blockquote, where the CSS puts an image on top:

blockquote blockquote {
    background: transparent no-repeat top/30% url('/img/Alte-Weise-Kopf.svg');
    text-align:center;
    padding-left:2rem;
    padding-right:2rem;
    padding-top:12rem;
    padding-bottom:1rem;
    background-color:#b6c7c8; border-radius:2.5rem
}

The actual text in Markdown is then:

>> „Zweifel ist nicht das Gegenteil, sondern ein Element des Glaubens.“
>>
>> Paul Tillich

That way the ordinary blockquote in Markdown (single >) is left free to be used for citations.

Obviously, entering the text in >> is way easier than producing an image for each epigram.

2. Care was taken to reduce the number of images needed for the social media icons.

Old:

New:

That reduces loading eight images. However, you need to load some font glyphs.

<a style="background-color:SkyBlue; color:white" href="https://telegram.me/share/url?url=<?=$urlEncoded?>&text=<?=$titleEncoded?>"
   title="Teilen auf Telegram" target=_blank>&nbsp;<span class=symbols>&#x01fbb0;</span>&nbsp;Telegram&nbsp;</a>

In particular this symbol U+1fbb0 is %F0%9F%AE%B0 when URL encoded:

@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+Symbols+2&text=%F0%9F%97%8F%F0%9F%AE%B0%F0%9F%96%82%F0%9F%96%A8');

Similarly, symbol U+1f5cf is %F0%9F%97%8F when URL encoded.

5. Converting WordPress HTML to Markdown

Perl script blogwendtmd is used to convert a single HTML file to Markdown.

$ time ( for i in *.html; do blogwendtmd $i; done )
        real 94.95s
        user 136.51s
        sys 0
        swapped 0
        total space 0

The long runtime is exclusively for running tesseract, i.e., the conversion from image to text. Once all WordPress posts are converted to Markdown, this script no longer needs to be run, obviously.

blogwendtmd is 180 lines of Perl code.

Listing of all authors and their corresponding directories.

$ perl -ne 'print $1."\n" if /\/author\/([^\/]+)\//' 2*.html | sort -u
alexander
archi-bechlenberg
bernd-zeller
cora-stephan
david-berger
hansjoerg-mueller
joerg-friedrich
matthias-matussek
redaktion
samuel-horn
wolfram-ackner

Each of these authors have a separate index beneath /author/.

Generating all yearly overviews:

for i in *; do ( echo $i; cd $i; blogwendtdate -gy$i *.md > index.md ) done

Perl script blogwendtdate generates a Markdown file, which contains all articles for the corresponding year. This script first has to store all posts for one year in a hash, sort it according to date in the frontmatter.

my @L;	# list of posts in a year, in the beginning not necessarily sorted

sub markdownfile(@) {
    my $f = $_[0];
    my ($flag,$title,$date,$draft) = (0,"","",0);
    open(F,"<$f") || die("Cannot open $f");
    while (<F>) {
        if (/^\-\-\-\s*$/) {
            last if (++$flag >= 2);
        . . .
    }
    if ($draft == 0  &&  length($title) > 0  &&  length($date) > 0) {
        push(@L, sprintf("%s: [%s](%s%s)",$date,$title,$prefix,substr($f,0,-3)) );
    }
    close(F) || die("Cannot close $f");
}

while (<@ARGV>) {
    #printf("ARGV=|%s|\n",$_);
    next if (substr($_,-8) eq "index.md");
    markdownfile($_);
}

for (sort @L) {
    printf("%d. %s\n",++$cnt,$_);
}

Many HTML errors were corrected, which were reported by Nu Html Checker. See for example das-magische-sprechen-schafft-macht-fuer-den-augenblick.

6. Handling comments

The Publico blog contains comments, where readers have left their thoughts. In Perl script blogwendtmd we detect comments by checking for <h5> tags for the beginning, and pinglist for the end of all comments.

if (/^<ul class="pinglist">/) { $flag = 0; next; }
elsif (/<h5 class="comments-h">/) {
    ...
    $flag = 1;
}
next if ($flag == 0);

We refrained from integrating the commenting system HashOver. It is not difficult, as we have already demonstrated in the Lemire theme. However, for a political blog a comment system is rather "dangerous", as it can attract rather unwelcoming writings. Under German law the hoster of these comments becomes liable. Essentially, you therefore must check every comment manually:

... da die Kommentare alle gesichtet werden müssen und die Redaktion nach wie vor aus dem Gründer Alexander Wendt und einer Teilzeitredakteurin besteht, können sie nicht umgehend online gehen.

In light of the high volume of comments HashOver should most probably be added.

7. Running static site generator

In serial mode it takes less than 3 seconds to build 19 collections without comments. With comments it takes less than 6 seconds to process 23 thousand pages, see below. This build time can be almost halved by using parallelisation with -p16.

$ time php saaze -morb /tmp/build
Building static site in /tmp/build...
    execute(): filePath=./content/alexander.yml, nSIentries=770, totalPages=39, entries_per_page=20
    execute(): filePath=./content/alte-weise.yml, nSIentries=131, totalPages=7, entries_per_page=20
    execute(): filePath=./content/archi-bechlenberg.yml, nSIentries=5, totalPages=1, entries_per_page=20
    execute(): filePath=./content/bernd-zeller.yml, nSIentries=332, totalPages=17, entries_per_page=20
    execute(): filePath=./content/cora-stephan.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/david-berger.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/fake-news.yml, nSIentries=28, totalPages=2, entries_per_page=20
    execute(): filePath=./content/film.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/hansjoerg-mueller.yml, nSIentries=2, totalPages=1, entries_per_page=20
    execute(): filePath=./content/hausbesuch.yml, nSIentries=2, totalPages=1, entries_per_page=20
    execute(): filePath=./content/joerg-friedrich.yml, nSIentries=2, totalPages=1, entries_per_page=20
    execute(): filePath=./content/mag.yml, nSIentries=1235, totalPages=62, entries_per_page=20
    execute(): filePath=./content/matthias-matussek.yml, nSIentries=1, totalPages=1, entries_per_page=20
    execute(): filePath=./content/medien-kritik.yml, nSIentries=123, totalPages=7, entries_per_page=20
    execute(): filePath=./content/politik-gesellschaft.yml, nSIentries=486, totalPages=25, entries_per_page=20
    execute(): filePath=./content/redaktion.yml, nSIentries=112, totalPages=6, entries_per_page=20
    execute(): filePath=./content/samuel-horn.yml, nSIentries=3, totalPages=1, entries_per_page=20
    execute(): filePath=./content/spreu-weizen.yml, nSIentries=596, totalPages=30, entries_per_page=20
    execute(): filePath=./content/wolfram-ackner.yml, nSIentries=6, totalPages=1, entries_per_page=20
Finished creating 19 collections, 19 with index, and 1248 entries (2.58 secs / 809.47MB)
#collections=19, parseEntry=0.7290/23712-19, md2html=1.1983, toHtml=1.2839/23712, renderEntry=0.1562/1248, renderCollection=0.0403/224, content=23712/0
    real 5.16s
    user 4.36s
    sys 0
    swapped 0
    total space 0

Running pagefind, i.e., indexing al keywords for the WebAssembly based search functionality:

$ time pagefind -s . --exclude-selectors aside --exclude-selectors footer --force-language=de

Running Pagefind v1.0.4
Running from: "/tmp/buildwendt"
Source:       ""
Output:       "pagefind"

[Walking source directory]
Found 1473 files matching **/*.{html}

[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.

[Reading languages]
Discovered 1 language: de

[Building search indexes]
Total:
  Indexed 1 language
  Indexed 1473 pages
  Indexed 133261 words
  Indexed 0 filters
  Indexed 0 sorts

Finished in 19.644 seconds
        real 19.87s
        user 18.28s
        sys 0
        swapped 0
        total space 0

It would take 11 seconds without comments, i.e., indexing 77,168 words.

8. Collections

There are quite a number of collections at play in this theme. The most important one being mag (short for magazine). This directory contains all the blog posts. All the other collections are just symbolic links to mag, i.e., they do not contain additional content.

total 96
drwxr-xr-x  4 klm klm 4096 Apr 27 17:11 ./
drwxr-xr-x  7 klm klm 4096 May 13 13:00 ../
lrwxrwxrwx  1 klm klm    3 Mar 26 21:48 alexander -> mag/
-rw-r--r--  1 klm klm  273 Apr  2 18:56 alexander.yml
lrwxrwxrwx  1 klm klm    3 Apr 27 17:11 alte-weise -> mag/
-rw-r--r--  1 klm klm  225 Apr 27 17:10 alte-weise.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 archi-bechlenberg -> mag/
-rw-r--r--  1 klm klm  495 Apr  2 18:58 archi-bechlenberg.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:17 bernd-zeller -> mag/
-rw-r--r--  1 klm klm  213 Apr  2 18:01 bernd-zeller.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 15:18 cora-stephan -> mag/
-rw-r--r--  1 klm klm  707 Apr  2 19:01 cora-stephan.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 15:17 david-berger -> mag/
-rw-r--r--  1 klm klm  761 Apr  2 19:06 david-berger.yml
drwxr-xr-x  2 klm klm 4096 Apr  2 16:24 error/
-rw-r--r--  1 klm klm   88 Apr  2 16:21 error.not_used_yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 fake-news -> mag/
-rw-r--r--  1 klm klm  216 Apr  2 19:42 fake-news.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 film -> mag/
-rw-r--r--  1 klm klm  201 Apr  2 19:43 film.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 hansjoerg-mueller -> mag/
-rw-r--r--  1 klm klm  318 Apr  2 18:56 hansjoerg-mueller.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 hausbesuch -> mag/
-rw-r--r--  1 klm klm  219 Apr  2 19:42 hausbesuch.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 15:18 joerg-friedrich -> mag/
-rw-r--r--  1 klm klm  222 Apr  2 18:01 joerg-friedrich.yml
drwxr-xr-x 10 klm klm 4096 May 12 20:56 mag/
-rw-r--r--  1 klm klm  110 Apr  1 22:25 mag.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 matthias-matussek -> mag/
-rw-r--r--  1 klm klm  228 Apr  2 18:02 matthias-matussek.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 medien-kritik -> mag/
-rw-r--r--  1 klm klm  234 Apr  2 19:27 medien-kritik.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 17:47 politik-gesellschaft -> mag/
-rw-r--r--  1 klm klm  255 Apr  2 17:59 politik-gesellschaft.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:16 redaktion -> mag/
-rw-r--r--  1 klm klm  202 Apr  2 18:03 redaktion.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:21 samuel-horn -> mag/
-rw-r--r--  1 klm klm  259 Apr  2 19:03 samuel-horn.yml
lrwxrwxrwx  1 klm klm    3 Apr  2 19:25 spreu-weizen -> mag/
-rw-r--r--  1 klm klm  231 Apr  2 19:27 spreu-weizen.yml
lrwxrwxrwx  1 klm klm    3 Mar 31 17:22 wolfram-ackner -> mag/
-rw-r--r--  1 klm klm  542 Apr  2 19:05 wolfram-ackner.yml

The collection yaml files look like this. First mag.yml:

title: Publico
sort_field: date
sort_direction: desc
index_route: /
entry_route: /{slug}
more: true
rss: true

Now alexander.yml, which filters for author:

title: Publico - Autor Alexander Wendt
subtitle: "Alexander Wendt ist Herausgeber von Publico."
sort_field: date
sort_direction: desc
index_route: /author/alexander
entry: false
entry_route: /{slug}
more: true
filter: return ($entry->data['author'] === 'Alexander Wendt');

Similarly, alte-weise.yml, which filters for categories:

title: Publico - Alte &amp; Weise
sort_field: date
sort_direction: desc
index_route: /alte-weise
entry: false
entry_route: /{slug}
more: true
filter: return (array_search('alte-weise',$entry->data['categories']) !== false);

Except mag.yml, all other yaml files set rss: false.

9. Templates

This theme uses the following PHP template files:

bottom-layout.php: commonalities for the bottom part
entry.php: template for the entry, i.e., the usual blog post
error.php: 404 page, or other error conditions
head.php: HTML for the first few lines for all HTML files
index.php: template for the index, i.e., the listing of posts
overview.php: HTML sitemap
rss.php: RSS feed
sitemap.php: XML sitemap
top-layout.php: commonalities for the top part

I use the following hierarchy of PHP files for my entry-template, i.e., the template for a blog post:

# entry.php ## top-layout.php ### head.php ## Actual content: $entry['content'] ## bottom-layout.php

The following hierarchy is used for the index-template, i.e., the template for showing a reverse-date sorted list of blog posts:

# index.php ## top-layout.php ### head.php ## for-loop over entry-excerpts ## bottom-layout.php