Read the docs with any static site generator

Jun 19, 2022 Tags: Python Kart

I while back when I first started writing the docs for my static site generator, I had to decide where to host it. At first it seemed a very simple decision; on the long run it resulted in a little and fun adventure. Let's jump right into it!

Read the docs

For python projects, the first option is Read the docs. It is a platform for building technical documentation. Is mainly used by python software, but is not limited to id. Big projects such as Godot, AMD ROCm and Certbot use Read the docs to host their documentation. It is a common go-to solution. However there is a problem. I wanted to build the documentation for my static site generator and Read the Docs only allows websites built with Sphinx or MkDocs. I don't think it would be a good impression if the Kart documentation was build with another generator. So I decided to use Gitlab Pages.

Gitlab pages

Gitlab pages is an awesome tool created by gitlab to host any static site. You simply have to create a repository with the source of your site, the setup an appropriate CI/CD template which builds the site. If you repository is named account_name.gitlab.io, you can access the website at account_name.gitlab.io, otherwise the website is accessible at account_name.gitlab.io/name_of_the_repository. This is useful, for example, if you are an organization and you have a repository named organization.gitlab.io with a landing page, and then each project has his own documentation, so you can access all of them under the same url.

However if the main website is a little complex, it poses some problems. For example, how can I know if site.gitlab.io/directory belongs to the main website or to some other project? You simply can't, from an external point of view. This feature of gitlab pages is really useful if you want to group a number of projects under the same umbrella, but is an obstacle if you are an individual and want each one of your project to have a life of its own. It also is a problem for SEO. For example the Google Seach Console really struggles (and I understand why!) if you put two different websites under the same url.

There are some ways around this. You can buy a domain name and follow this guide to put each one of your project under a different subdomain. Cool, project solved, right? Yes, but unfortunately right now I don't have my own domain name, and even though I plan to get it soon, I think I found an alternative solution, which in some ways may be better. Moreover it doesnt' cost anything, so if you really don't want to spend any money this might be the solution for you.

Back to read the docs

Yes, the solution is going back to Read the Docs. But we will have to be creative, as what we are going to do is not what the developer of the platform had though. But hey, this is Jamstack, crazier things have been done (just look at tools like utterances, giscus and gitalk that use github issues to add comments to static sites).

Abusing mkdocs

Mkdocs by default uses markdown to generate its contents. However if it finds any other in the source directory which is not markdown, it will copy it straight away. We can take advantage of this feature.

To start have to create two configuration files, one for Read the Docs and one for Mkdocs. In the next paragraphs I will assume that the documentation source is in a subdirectory of the project called docs. First of all we have to create readthedocs.yml in the root directory:

version: 2

build:
  os: "ubuntu-20.04"
  tools:
    python: "3.10"
  jobs:
    pre_build:
    
      - bash docs/build.sh

python:
  install:
    - requirements: docs/requirements.txt

mkdocs:
  configuration: docs/mkdocs.yml

and then mkdocs.yml in the docs directory:

site_name: Project name (this can't be omitted but is useless)

docs_dir: "output_dir_of_your_static_site_generator"

Ok, now let's explain. In the first file we tell readthedocs what to do when building the new site. We first have to tell it to actually build the documentation. To do this we use a hook called pre_build, which is called before mkdocs, to call the build script of the documentation; in my case I wrote a little script called build.sh, you can really decide what to call it, where to put it, or you can even ditch the file and write the command directly if it is simple.

Then we have to tell readthedocs which packages to install, and we tell it to look at a file called requirements.txt in the docs dir. Unfortunately Read the Docs supports only this format, not pyproject.toml, so we'll have to live with that. You can also install directly apt packages, if you need them. You can find the related documentation here.

We then specify the location of the Mkdocs configuration file. In it the only thing important is docs_dir, which is the output directory of you static site generator of choice. In my case it is public, in yours it will be different.

Add search functionality

One of the most challenging things to add to a static website is the search functionality. For normal websites the search functionality is provided on the server side. Static sites however have to rely only on the javascript on the client side. Multiple javascript libraries, have been published for this purpose. But they all have a problem in common: for the library to search in your website for a word, you need a complete index of your website on the client side. This means that even if you want to look up a single page, your browser also has to download the content of all the pages of your site. If the site is big, this can become quite a burden. You can get around this problem by downloading this index after the page has been fully loaded, so that the page feels snappy and responsive even if you have to download a big chunk of data. The plus side of this option is that once you download the index, and if the index is not THAT huge, searching can be really fast. MkDocs follows this strategy, using a library called lunrjs.

The alternative is to rely on other services, third party or hosted, that index your site and that you can call with an api to get the result for a search query. The plus side of this is that we don't have to download all the data at the start, owever speed can become a problem if the network latency is very high. Fortunately Read the Docs provides exactly this. Another cool feature of read the docs is that it can use the index created for lunrjs. This way we have to create the index only once, and then, based on our requirements, we can decide if we want to use the api provided by RTD or use lunrjs.

So let's start by creating the index for lunrjs. For this I created a small script. With minor modifications it can also work for your use case. It simply parses every html file in a directory using Beautiful Soup, and then creates a json file containing the text content of the page.

The problem now is that we have to run this script after Read the Docs calls Mkdocs, otherwise our file gets overwritten. To overcome this we are going to use a fantastic mkdocs plugin called mkdocs-simple-hooks, that let's run custom script at any point in the mkdocs build process. No we have only to modify mkdocs.yaml like this, and everything should work!

site_name: ...

docs_dir: "..."

plugins:
  - search
  - mkdocs-simple-hooks:
      hooks:
        on_post_build: "docs.lunr:build_search_index"

Also don't forget to add Beautiful Soup and mkdocs-simple-hooks to the requirements.txt files we have created earlier.

Now the only thing to add some javascript in the frontend. I don't want to drag this post for too long, so I will just link here the complete source of the file I used. I simply copied the source from readthedocs-sphinx-search and modified it a little bit to make it work with the style of my docs.

Conclusion

In this post I wanted to show how I used some not so known features of Read the Docs and Mkdocs to build my custom documentation on a platform which doesn't support it. You can see here the results. While doing got a deeper understanding of how these two project work, discoverd some new tools and in general learned a lot. I think that trying to customize, one can say "hack", a tool we use can be very useful to understand how it really works under the hood. And as a programmer this can be really interesting, but also a lot of fun too!