Why the standard plugins for jekyll are very far from ideal.

In this note I will not provide the full code for all components, I will only give snippets and tips on what to pay attention to when assembling a site on jekyll.

Jekyll Structured Data and sitemap.xml 1535x697 jekyll-structured-data-sitemap-and-nginx.png
Jekyll Structured Data and sitemap.xml

File modification time

Every page has at least three timestamp points in different files, page elements, or server responses, and they must all be the same.

  • ld+json "dateModified": "2025-03-07T15:43:42+00:00"
  • sitemap <lastmod>2025-03-07T15:43:42+00:00</lastmod>
  • headres last-modified: Fri, 07 Mar 2025 15:43:42 GMT

Let’s start with sitemap.xml, in the standard jekyll-sitemap, for pages that are used to output collections on this site, the modification time is taken from the file modification time.
That is, for the page https://webart4.me/en/linux/blog/
the file modification time will be set from /categories/en/blog.md:
{{ page.last_modified_at | date_to_xmlschema }}, but this is not entirely correct.

Because, when modifying, and most importantly when adding a new publication in this category (collection), the modification time will remain the same, although there will be new content on the page.

Therefore, we select all publications for a given locale, then those that belong to this category, sort by date, and select the newest one.

1
2
3
4
5
6
{% assign new_posts = site.posts | where_exp:"post", "post.locale == 'en'" | where_exp:"post", "post.categories contains 'blog'" | sort: "last_modified_at" | reverse | first | default: "notfound" %}
{% if new_posts != "notfound" %}
    <lastmod>{{ new_posts.last_modified_at | date_to_xmlschema }}</lastmod>
{% else %}
    <lastmod>{{ page.last_modified_at }}</lastmod>
{% endif %}

Accordingly, the same logic will be used to generate Structured Data.

Server response Last-Modified

This is where things get more interesting.

When you rebuild the site jekyll build --destination , the files will be updated. Even if you use --incremental, you won’t be able to transfer the correct file attributes. Considering some files should have generated values. To do this, we use an elegant and simple script that can be added to the pipeline after each deployment on the server.

nano touch_files.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/bin/bash

OPTS=`getopt -o D: --long dir: -n 'parse-options' -- "$@"`
if [ $? != 0 ] ; then
  echo "Failed parsing options." >&2
  exit 1
fi
eval set -- "$OPTS"
while true; do
  case "$1" in
    -D | --dir )                DIR="$2"; shift 2 ;;
    -- ) shift; break ;;
    * ) break ;;
  esac
done

domain="https://webart4.me"

if [[ -f "${DIR}/sitemap.xml" ]]; then
    I=`cat ${DIR}/sitemap.xml | grep -E "<loc>" | wc -l`
    for ((i = 1; i <= "${I}"; i++)) ; do
        location=`/usr/bin/xmllint --xpath "string(//*[local-name()='url'][${i}]/*[local-name()='loc'])" ${DIR}/sitemap.xml`
        ifile="${location##$domain}"
        is_html="${ifile##*html}"
        if [[ ! -z "$is_html" ]]; then
                ifile="${ifile}index.html"
        fi
        ds=`/usr/bin/xmllint --xpath "string(//*[local-name()='url'][${i}]/*[local-name()='lastmod'])" ${DIR}/sitemap.xml`
        if [[ -f "${DIR}${ifile}" ]]; then
                printf '%s\n' "${i}   ${DIR}${ifile}"
                touch --date="${ds}" ${DIR}${ifile}
        else
                printf '%s\n' "file error   ${DIR}${ifile}"
        fi
    done
else
    printf '%s\n' "No sitemap.xml"
    exit 0
fi
1
2
/bin/bash ${path_to_script}/touch_files.sh --dir ${site_root_dir_with_sitemapxml}
  

The script parses sitemap.xml and sets <lastmod> for all files from <loc>.