Web Publishing

Using Obsidian as CMS

Obsidian is an application to write and organize notes as personal knowledge base. Notes are kept as plan Markdown files on the local drive. The application is used for every day note-taking, writing and research but it can be easily adapted to other purposes like project management tool or, in our case, content management system.

Why Using Obsidian as CMS?

Working with Obsidian for writing and editing content of a website, we encountered a few life improving features we totally love.

How to Publish Web Pages?

Although Obsidian notes are Markdown files, a slim version of markup language, notes cannot directly served in a web site. There must be some conversion occurring before Markdown files and the HTML version displayed as web page.

Adding front-matter to a note which then can be served as web page by pushing the note to a Github repository connected to a website via Github Pages. Or you run Jekyll locally, creating static web pages which then can be hosted on a regular hosting service.

Alternatively, you can rely on the paid note publishing service offered by Obsidian which allows you to publish a selection of notes directly from you Obsidian Vault to a website hosted by Obsidian.

Both approaches won't work for us because we have large quantity of structured data and because we need control of web server settings not provided by Github Pages or Obsidian publishing service. Instead, we publish the websites using Firebase Hosting. A custom script written in Python converts our notes to HTML files before being deployed to the hosting platform.

Some Ground Rules

Use of Obsidian shouldn't rely on plugins. Although Obsidian plugins can expand the functionality, we prefer a purist approach using the base functionality of Obsidian only.

Writing content in Markdown files should adhere to basic Markdown syntax as close as possible, avoiding unnecessary complexity.

How to Deal with Structured Data?

One set of structured data covers the attributes of a web page like meta tags content (language, title, description, canonical url etc.) Declaring these parameters using front-matter is an obvious choice but we applied another approach as front-matter comes with some disadvantages. Front-matter is a section on top of a markdown file written in YAML ("YAML is a human-friendly data serialization language for all programming languages."), initiated by 3 dashes and separated from the rest of the content by another 3 dashes.

Our implementation keeps this data in a published note, initiated with an h3 heading named "Meta" followed by a list of items where a parameter is kept before a colon and the value of the parameter behind the colon:

For this page

### Meta
- website:
- path: /website-development/using-obsidian-as-cms/
- title: Using Obsidian as CMS
- description: How to implementation Obsidian as content management system for Websites. 
- author: Karl-Heinz Müller

Another set of structured data is the content itself or section of a website. Our website, for example, publishes information about 150+ species, 500+ photos, 50+ videos and 1000+ log entries about in field species identifications and observations. We store the data for each item in a separate note following a pre-determined headings, lists and texts.

A python script converts these notes into a pandas data frame and then converts the data into web pages. The structure of these type of documents is rigid but still provides enough flexibility to extend information about an item.

As an example the Markdown of a photo file, close up of an American Bullfrog (Rana Catesbeiana):

## khm-20190624-1612-0000104643

![American Bull Frog](,compress&crop=entropy "American Bull Frog")

![American Bull Frog](,compress&crop=entropy "American Bull Frog")

### Title

American Bull Frog (Rana Catesbeiana)

### Caption

American Bull Frog (Rana Cateisbana) - Close Up

### Meta

- camera: NIKON D500
- lens: 90mm f/2.8
- exposure_time: 1/1000
- f_number: f/3.3
- focal_length: 90mm
- iso: 200
- topic: species
- species: American Bullfrog (Rana Catesbeiana)
- location: Parc Bernard-Landry
- city: LAVAL
- state: QC
- country: CA
- width: 4219
- height: 2808
- presentation: entropy
- tags: amphibians, frogs

We have two Markdown code lines for the image. These are two different layouts which serve us to verify that image rendering (presentation) is correct for the corresponding photo. It further allows viewing the photo as support when writing title and caption.

Meta section contains structured data about the photo. Most of the content in this section has been extracted from exif data of the the photo.

How to Deal with Media Files Like Photos and Videos?

Although Obsidian can contain media files, optimized publishing of high resolution media files requires CDN. We keep in Obsidian only notes for media files. The media files itself are stored outside of the Obsidian Vault. Editing content about a media file, or linking media file within a web page then can be done within Obsidian while creating the final published web page will convert the reference to the media file into the correct html snippet.

The script we deploy for this task has been adjusted to deal with high resolution photos. The resulting html code is customized for the image rendering and CDN service (IMGIX) we are using.

Python Script

We developed python scripts, custom static page generators, to create web pages from Obsidian notes. The script loads the content of all markdown files within specified folders and converts them from markdown to a fully fledged, state of the art, static web pages.

In more details, the script reads a file, cuts off the meta data section and converts the remaining part into html using python's markdown library with a few extensions (fenced_code, tables, attr_list). The meta data section is read line by line and converted into structured data.

Then, the script combines meta data and converted markdown into a dictionary which is rendered into html using Python's Jinja 2 library and templates. The resulting code is saved in a folder containing a local copy of the website which can be viewed as localhost or deployed to the corresponding firebase hosting project.

We use Python 3.8.x and the libraries used are:

import os
import markdown

import lxml.html
import lxml.etree
import re

from jinja2 import Environment, FileSystemLoader

At the moment, we are still re-factoring the deployment python scripts. We will publish it later in a Github repository. If you would like to see the code now, please contact us.

Future Additions and Remarks

We still have to deal with with media files (photos, videos, graphs) referenced within a content block of a note.