Introducing JoliMarkdown, for a more robust and rigorous markdown content

This blog post has been written using Markdown, a simple text syntax for writing structured documents. Markdown is frequently used in the development world (documentation in the form of a markdown README files, adoption by many publishing platforms) and is often also employed for Web publishing. It was, for example, the syntax chosen when the JoliCode website was created in 2012, and is still used today to structure the various bodies of content (blog posts, customer references, technologies, team sheets, etc.).

Section intitulée some-context-and-markdown-historySome context and markdown history

Since its creation in 2004, this syntax has aimed to offer an alternative, faster and more human way of writing HTML documents for Web publishing. Over the ensuing years, Markdown syntax has evolved iteratively, without any formal, perfectly standardized specification. Various variants (or “flavors”) have emerged, but none has become a de facto standard.

One of the most robust alternatives, however, is CommonMark, a Markdown variant that was formally specified in 2014 and has been evolving ever since.

Over the last 12 years, our way of transforming Markdown content into HTML has changed. Back in 2012, we started writing a few articles in pure HTML, then we began using a client-side javascript Markdown pre-processor, and finally, over the last few years, we have migrated to the excellent league/commonmark library, which allows you to transform Markdown into HTML on the server side, in PHP. This library was chosen because it is particularly complete, well-maintained, extensible and robust.

During the development of league/commonmark, extension mechanisms were added, to support different Markdown “extensions”, i.e. to support syntax elements that are not part of the CommonMark standard, but bring syntactic flexibility to writers. For example, the tables extension makes it possible to write tables in Markdown, with a lighter, more readable syntax, which is not possible in “standard” CommonMark.

Section intitulée a-great-feature-and-its-downsidesA great feature and its downsides

One of the founding features of Markdown is its compatibility with HTML: in Markdown, it’s perfectly valid to insert HTML tags into text, and these will simply be passed on as they are in the final HTML document. For example, you can write:

# A Markdown document

<p>An HTML paragraph.</p>

A paragraph in Markdown.

Such a document will be rendered, in HTML, as follows:

<h1>A Markdown document</h1>
<p>An HTML paragraph.</p>
<p>A paragraph in Markdown.</p>

CommonMark’s extension mechanism is therefore interesting, as it allows syntactic elements to be added that the extension will be able to interpret to generate rich, complex HTML output, without the end user (the editor) having to write HTML. This notion of extension is provided for in CommonMark (the CommonMark specification is itself written in CommonMark and uses an extension to generate side-by-side rendering of Markdown syntax and the corresponding HTML output, as can be seen, for example, in the tabs section).

On the JoliCode site, we’ve taken advantage of the flexibility of league/commonmark to enrich HTML rendering, over the years, so that we can write richer, more expressive, more visual Markdown documents. For example, we’ve added an extension to write footnotes, HTML tables, strikethrough text, add HTML attributes to external links, automatically add attributes to <img> tags, and so on.

In spite of this, over the past twelve years we have frequently written HTML code within Markdown articles, in order to meet certain needs:

  • add CSS classes to HTML elements, to be able to style them differently (centering an image on the page, for example);
  • insert code with CSS classes, to use a syntax highlighting library;
  • create the HTML structure to position two images side by side;
  • etc.

Sometimes HTML code has been added because the author of an article was uncomfortable with certain arcana of markdown, and chose the most direct approach to be able to publish his content. The use of HTML may have been appropriate at the time, but as the possibilities offered by HTML change, so do its limits: whereas for elements written in markdown, we can now make the program in charge of HTML rendering evolve to take on board new HTML functionalities, we can’t do this for elements written directly in HTML, which will remain frozen in time in the form their author has chosen.

For example, we’d like to be able to offer images in modern, higher-performance formats (such as webp, which is both smaller and of better quality) than those used just a few years ago. For these images, we also want to move away from the use of the <img> tag, and take advantage of <picture>, <source> tags, and attributes like srcset.

For images that have been inserted into articles using Markdown syntax, we can upgrade the HTML rendering program to support these new formats and tags.

For images that have been inserted in HTML, we can’t do this, and so have to replace them manually – or leave them as they are, with the inconvenience of having to accept that the articles concerned use dated, less efficient technologies, which have an impact on both speed and the comfort offered to site users.

So we’re looking for an approach to correct existing Markdown articles, replacing the HTML elements they contain with equivalent Markdown elements wherever possible without distorting the final HTML rendering.

Section intitulée a-commonmark-extensions-to-the-rescueA Commonmark extensions to the rescue

An extension, available in league/commonmark for a few years now, can specifically help us with this task: it’s the Attributes extension, which lets you add HTML attributes to Markdown elements. For example, you can write:

![An image](/path/to/image.jpg)

![Another image](/path/to/image.jpg){.image-class}

…which will be rendered in HTML as follows:

<p class="block-class"><img src="/path/to/image.jpg" alt="An image"></p>
<p><img src="/path/to/image.jpg" alt="Another image" class="image-class"></p>

With the help of this extension, we have written the JoliMarkdown library which is able to analyze some markdown content and output a better version of it, by replacing unnecessary HTML tags with their Markdown equivalent when possible.

In a few words, it works in a couple of steps:

  • analyze the Markdown content of the article;
  • identify HTML elements that could be replaced by equivalent Markdown elements;
  • replace these HTML elements with Markdown elements, adding the necessary HTML attributes (using the Attributes extension) so that the final HTML rendering is identical to that of the original article.

Writing JoliMarkdown was quite enjoyable and helped us find issues in the ~300 blog posts that we have written since the JoliCode’s blog exists: unclosed HTML tags, malformed HTML sequences, many externals links without the nofollow or noopener attributes, etc. Also, we have transformed many HTML blocks into their pure markdown equivalent, which in turn allows us to correctly use all the markdown renderers that we use (responsive images in place of plain <img tags, etc.).

Section intitulée limitations-opportunities-perspectivesLimitations, opportunities, perspectives

The library “as is” works quite well, but it can be a bit disturbing to modify any articles at once, that you have spent a lot of time writing.

There are tests in JoliMarkdown, but it is still a quite new (and sometimes complicated) piece of software. Before “fixing” all our markdown content with this fixer, we first wanted to be able to preview the changes that would be made to each article, and to be able to validate them before applying them.

To do this, we have developed a small Symfony console command, which allows us to preview the changes that would be made to a given article, using a diff (💌 to the Delta differ) between the original and modified Markdown content. This command also allows us to apply the changes to the article, if we are satisfied with the result.

Here are some example of what it can achieve:

Description Capture
HTML tables An HTML table that is transformed into Commonmark, for a greater readability
HTML div and with nested image tags An HTML div with a nested image are transformed into Commonmark
(broken) HTML code Some misformatted HTML code is correctly fixed
HTML unordered lists An HTML unordered list gets transformed to Commonmark

However, there are some edge-cases when it is simply not possible to safely transform HTML back to markdown – because the associated CSS could be based on the way HTML elements are nested, for examples, which would result in a different rendering if we were to transform the HTML into Markdown.

There’s another thing we are not satisfied with: the fact that this library indeed owes many features to other projects, which we would prefer to contribute back to these projects. The league/commonmark library has an issue opened about transforming HTML to markdown and we think that pieces of JoliMarkdown could serve as inspiration to solve this issue. JoliMarkdown itself uses some (extensively) modified code from Stefan Zweifel’s Commonmark Markdown Renderer. If you want to improve on JoliMarkdown, a good way to do it could be to contribute back to these projects!

Therefore, JoliMarkdown is an exploratory work, which we are sharing with you today, and which we hope may help others go further in this direction 😀

Section intitulée enough-talk-let-me-test-itEnough talk, let me test it

If you want to test JoliMarkdown on your own markdown content, you can install it using composer:

composer require jolicode/jolimarkdown

And read the rest of the documentation on the GitHub repository of the project.

Section intitulée try-it-onlineTry it online

The JoliMarkdown demo website

You can also use JoliMarkdown on a demo website: https://jolimarkdown.jolicode.com. This website is a simple Symfony sandbox application that uses JoliMarkdown to render the content of a given Markdown text.

Commentaires et discussions

Ces clients ont profité de notre expertise