Original post is here: eklausmeier.goip.de
Pagefind is a JavaScript library, which you add to your static site. By that you then have complete search-functionality. Pagefind has the following advantages over other JavaScript libraries:
- Easy to install, no JavaScript dependency hell.
- Easy to add the CSS and the two lines with
<script>
tag. - Creating the index is easy and reasonable quick.
Pagefind was mainly written by Liam Bigelow from New Zealand and is promoted by CloudCannon. It is open source. It is written in Rust and JavaScript.
Language | kLOC | #files |
---|---|---|
Rust | 36 | 63 |
JavaScript | 2 | 20 |
1. One-time installation. Installing Pagefind is just downloading a single binary from GitHub: select the proper binary for Apple, Linux, or Windows. In my case I used pagefind-v1.0.3-x86_64-unknown-linux-musl.tar.gz
for Arch Linux. Unpack with
1tar zxf pagefind-v1.0.3-x86_64-unknown-linux-musl.tar.gz
Unpacking the 10 MB archive will create a 22 MB exectuable, which is statically linked and therefore has no dependencies. That's it.
2. Add CSS and JavaScript to template. Add below CSS and JavaScript reference to your template file outside of <body>
:
1<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
2<script src="/pagefind/pagefind-ui.js"></script>
3<script>
4 window.addEventListener('DOMContentLoaded', (event) => {
5 new PagefindUI({ element: "#search", showSubResults: true });
6 });
7</script>
Then add the actual search dialog in your template inside <body>
, in my case to top-layout.php
:
1<div id="search"></div>
3. Creating index files. This step must repeated whenever you have new content, or rename files. It does not need to be repeated whenever you regenerate your static HTML files. Altough if you want to play safe, you can do just that. Index creation is using the above mentioned executable pagefind
. Running this command shows all the options:
1$ pagefind -h
2Implement search on any static website.
3
4Usage: pagefind [OPTIONS]
5
6Options:
7 -s, --site <SITE>
8 The location of your built static website
9 --output-subdir <OUTPUT_SUBDIR>
10 Where to output the search bundle, relative to the processed site
11 --output-path <OUTPUT_PATH>
12 Where to output the search bundle, relative to the working directory of the command
13 --root-selector <ROOT_SELECTOR>
14 The element Pagefind should treat as the root of the document. Usually you will want to use the data-pagefind-body attribute instead.
15 --exclude-selectors <EXCLUDE_SELECTORS>
16 Custom selectors that Pagefind should ignore when indexing. Usually you will want to use the data-pagefind-ignore attribute instead.
17 --glob <GLOB>
18 The file glob Pagefind uses to find HTML files. Defaults to "**/*.{html}"
19 --force-language <FORCE_LANGUAGE>
20 Ignore any detected languages and index the whole site as a single language. Expects an ISO 639-1 code.
21 --serve
22 Serve the source directory after creating the search index
23 -v, --verbose
24 Print verbose logging while indexing the site. Does not impact the web-facing search.
25 -l, --logfile <LOGFILE>
26 Path to a logfile to write to. Will replace the file on each run
27 -k, --keep-index-url
28 Keep "index.html" at the end of search result paths. Defaults to false, stripping "index.html".
29 -h, --help
30 Print help
31 -V, --version
32 Print version
This blog uses Simplified Saaze. In the case of Simplified Saaze I generate static files like this:
1php saaze -mortb /tmp/build
This builds all static files in /tmp/build
, which happens to be in a RAM disk on Arch Linux. Then change to this directory and issue
1$ time pagefind -s . --exclude-selectors aside --exclude-selectors footer --force-language=en
2
3Running Pagefind v1.0.3
4Running from: "/tmp/build"
5Source: ""
6Output: "pagefind"
7
8[Walking source directory]
9Found 555 files matching **/*.{html}
10
11[Parsing files]
12Did not find a data-pagefind-body element on the site.
13↳ Indexing all <body> elements on the site.
14
15[Reading languages]
16Discovered 1 language: en
17
18[Building search indexes]
19Total:
20 Indexed 1 language
21 Indexed 555 pages
22 Indexed 33129 words
23 Indexed 0 filters
24 Indexed 0 sorts
25
26Finished in 1.618 seconds
27 real 1.65s
28 user 1.49s
29 sys 0
30 swapped 0
31 total space 0
The command
1pagefind -s . --force-language=en
would habe been enough in many cases. In my special case I want to exclude content, which resides between <aside>
and </aside>
, and similarly between <footer>
and </footer>
.
The option --force-language=en
is required in my case as I have English and German posts.
Without this option pagefind would create two distinct indexes: You can then either only search in one language but not in the other.
By forcing the language pagefind puts everything into a single index.
See Multilingual search.
Indexing creates a directory called pagefind
. Just copy this directory to your web-server during deployment. This directory looks something like this:
1pagefind
2├── fragment
3│ ├── en_0933ef4.pf_fragment
4│ ├── en_100be25.pf_fragment
5│ ├── en_10b07a1.pf_fragment
6│ ├── . . .
7│ └── en_fef8cdb.pf_fragment
8├── index
9│ ├── en_22c87b9.pf_index
10│ ├── en_26afa46.pf_index
11│ ├── en_2a80efb.pf_index
12│ ├── . . .
13│ └── en_fde0a3b.pf_index
14├── pagefind.en_d6828bd6ef.pf_meta
15├── pagefind-entry.json
16├── pagefind.js
17├── pagefind-modular-ui.css
18├── pagefind-modular-ui.js
19├── pagefind-ui.css
20├── pagefind-ui.js
21├── wasm.en.pagefind
22└── wasm.unknown.pagefind
23
243 directories, 596 files
These files in index
are usually around 40KB each, those in fragment
are usually around 1-10 KB each. The JavaScript totals 100KB, CSS is less than 20KB.
4. Network traffic. Pagefind was particularly designed to only load small amounts of data over the network. This can be seen from below diagram.
This makes Pagefind particularly attractive performancewise.
5. Using Pagefind as user. Using Pagefind as user is intuitive and needs no further explanation. This blog has Pagefind integrated into every page as of now. Just type a word you want to search, then results will pop-up almost instantly. This instant reaction is no surprise as the actual searching is done in the browser.
There is one slight limitation of Pagefind: currently you cannot search for word groups. I.e., consider Shakespeare's Hamlet:
To be, or not to be, that is the question
Searching for to
or be
would likely give you lots of results, but probably not the ones you are looking for. Clearly not a problem for this blog, as I do not have lyrics here.