Adebiyi Adedotun Lukman is a UI/Frontend Engineer based in Lagos, Nigeria who also happens to love UI/UX Design for the love of great software products. When …
More about
Adebiyi
…
CommonMark: A Formal Specification For Markdown
Smashing Newsletter
CommonMark is a rationalized version of Markdown syntax with a spec whose goal is to remove the ambiguities and inconsistency surrounding the original Markdown specification. It offers a standardized specification that defines the common syntax of the language along with a suite of comprehensive tests to validate Markdown implementations against this specification.
GitHub uses Markdown as the markup language for its user content.
In 2012, GitHub proceeded to create its own flavor of Markdown — GitHub Flavored Markdown (GFM) — to combat the lack of Markdown standardization, and extend the syntax to its needs. GFM was built on top of Sundown, a parser specifically built by GitHub to solve some of the shortcomings of the existing Markdown parsers at the time. Five years after, in 2017, it announced the deprecation of Sundown in favor of CommonMark parsing and rendering library, cmark in A formal spec for GitHub Flavored Markdown.
In the Common Questions section of Markdown and Visual Studio Code, it is documented that Markdown in VSCode targets the CommonMark Markdown specification using the markdown-it library, which in itself follows the CommonMark specification.
CommonMark has been widely adopted and implemented (see the List of CommonMark Implementations) for use in different languages like C (e.g cmark), C# (e.g CommonMark.NET), JavaScript (e.g markdown-it) etc. This is good news as developers and authors are gradually moving to a new frontier of been able to use Markdown with a consistent syntax, and a standardized specification.
A Short Note On Markdown Parsers
Markdown parsers are at the heart of converting Markdown text into HTML, directly or indirectly.
Parsers like cmark and commonmark.js do not convert Markdown to HTML directly, instead, they convert it to an Abstract Syntax Tree (AST), and then render the AST as HTML, making the process more granular and subject to manipulation. In between parsing — to AST — and rendering — to HTML — for example, the Markdown text could be extended.
CommonMark’s Markdown Syntax Support
Projects or platforms that already implement the CommonMark specification as the baseline of their specific flavor are often superset of the strict subset of the CommonMark Markdown specification. For the most part of it, CommonMark has mitigated a lot of ambiguities by building a spec that is built to be built on. GFM is a prime example, while it supports every CommonMark syntax, it also extends it to suits its usage.
CommonMark’s syntax support can be limited at first, for example, it has no support for this table syntax, but it is important to know that this is by design as this comment in this thread of conversation reveals: that the supported syntax is strict and said to be the core syntax of the language itself — the same specified by its creator, John Gruber in Markdown: Syntax.
At the time of writing, here are a number of supported syntax:
To follow along with the examples, it is advised that you use the commonmark.js dingus editor to try out the syntax and get the rendered Preview, generated HTML, and AST.
Paragraphs And Line Breaks
In Markdown, paragraphs are continuous lines of text separated by at least a blank line.
The following rules define a paragraph:
Syntax | Rendered HTML |
---|---|
This is a line of text | <p>This is a line of text</p> |
This is a line of text And another line of text And another but the same paragraph |
<p>This is a line of text And another line of text And another but the same paragraph</p> |
This is a paragraph
And another paragraph And another |
<p>This is a paragraph</p> <p>And another paragraph</p> <p>And another</p> |
Two spaces after a line of text Or a post-fixed backslash\ Both means a line break |
<p>Two spaces after a line of text<br /><br>Or a post-fixed backslash<br /><br>Both means a line break</p> |
Headings in Markdown represents one of the HTML Heading elements. There are two ways to define headings:
The following rules define ATX headings:
Syntax | Rendered HTML |
---|---|
# Heading 1 | <h1>Heading 1</h1> |
## Heading 2 | <h2>Heading 2</h2> |
### Heading 3 | <h3>Heading 3</h3> |
#### Heading 4 | <h4>Heading 4</h4> |
##### Heading 5 | <h5>Heading 5</h5> |
###### Heading 6 | <h6>Heading 6</h6> |
## Heading 2 ## | <h2>Heading 2</h2> |
The following rules define Setext headings:
Syntax | Rendered HTML |
---|---|
Heading 1 = |
<h1>Heading 1</h1> |
Heading 2 – |
<h2>Heading 2</h2> |
Emphasis And Strong Emphasis
Emphasis in Markdown can either be italics or bold (strong emphasis).
The following rules define emphasis:
Syntax | Rendered HTML |
---|---|
_Italic_ |
<em>Italic</em> |
*Italic* |
<em>Italic</em> |
__Bold__ |
<strong>Italic</strong> |
**Bold** |
<strong>Italic</strong> |
Horizontal Rule
A Horizontal rule, <hr/> is created with three or more asterisks (*
), hyphens (-
), or underscores (_
), on a new line. The symbols are separated by any number of spaces, or not at all.
Syntax | Rendered HTML |
---|---|
*** |
<hr /> |
* * * |
<hr /> |
--- |
<hr /> |
- - - |
<hr /> |
___ |
<hr /> |
_ _ _ |
<hr /> |
Lists in Markdown are either a bullet (unordered) list or an ordered list.
The following rules define a list:
Syntax | Rendered HTML |
---|---|
* one * two * three |
<ul> <li>one</li> <li>two</li> <li>three</li> </ul> |
+ one + two + three |
<ul> <li>one</li> <li>two</li> <li>three</li> </ul> |
– one – two – three |
<ul> <li>one</li> <li>two</li> <li>three</li> </ul> |
– one – two + three |
<ul> <li>one</li> <li>two</li> </ul> <ul> <li>three</li> </ul> |
1. one 2. two 3. three |
<ol> <li>one</li> <li>two</li> <li>three</li> </ol> |
1. three 2. four 3. five |
<ol start=”3″> <li>three</li> <li>four</li> <li>five</li> </ol> |
1. one 100. two 3. three |
<ol> <li>one</li> <li>two</li> <li>three</li> </ol> |
Links are supported with the inline and reference format.
The following rules define a link:
<!--Markdown-->
[Google](https://google.com “Google”)
<!--Rendered HTML-->
<a href="https://google.com" title="Google">Google</a>
<!--Markdown-->
[Google](https://google.com)
<!--Rendered HTML-->
<a href="https://google.com">Google</a>
<!--Markdown-->
[Article](/2020/09/comparing-styling-methods-next-js)
<!--Rendered HTML-->
<a href="/2020/09/comparing-styling-methods-next-js">Comparing Styling Methods In Next.js</a>
<!--Markdown-->
[Google][id]
<!--At least a line must be in-between-->
<!--Rendered HTML-->
Rendered HTML: <a href="https://google.com" title="Google">Google</a>
<!--Markdown-->
<https://google.com>
<!--Rendered HTML-->
<a href="https://google.com">google.com</a>
<!--Markdown-->
<mark@google.com>
<!--Rendered HTML-->
<a href="mailto:mark@google.com">mark@google.com</a>
Images in Markdown follows the inline and reference formats for Links.
The following rules define images:
The HTML Block Quotation element, <blockquote>, can be created by prefixing a new line with the greater than symbol (>
).
Blockquotes can be nested:
They can also contain other Markdown elements, like headers, code, list items, and so on.
The HTML Inline Code element, <code>, is also supported. To create one, delimit the text with back-ticks (`), or double back-ticks if there needs to be a literal back-tick in the enclosing text.
<!--Markdown-->
`inline code snippet`
<!--Rendered HTML-->
<code>inline code snippet</code>
<!--Markdown-->
`<button type='button'>Click Me</button>`
<!--Rendered HTML-->
<code><button type='button'>Click Me</button></code>
<!--Markdown-->
`` There's an inline back-tick (`). ``
<!--Rendered HTML-->
<code>There's an inline back-tick (`).</code>
Code Blocks
The HTML Preformatted Text element, <pre>, is also supported. This can be done with at least three and an equal number of bounding back-ticks (`
), or tildes (~
) — normally referred to as a code-fence, or a new line starting indentation of at least 4 spaces.
<!--Markdown-->
```
const dedupe = (array) => [...new Set(array)];
```
<!--Rendered HTML-->
<pre><code>const dedupe = (array) => [...new Set(array)];</code></pre>
<!--Markdown-->
const dedupe = (array) => [...new Set(array)];
<!--Rendered HTML-->
<pre><code>const dedupe = (array) => [...new Set(array)];</code></pre>
Using Inline HTML
According to John Grubers original spec note on inline HTML, any markup that is not covered by Markdown’s syntax, you simply use HTML itself, with The only restrictions are that block-level HTML elements — e.g. <div>
, <table>
, <pre>
, <p>
, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces.
However, unless you are probably one of the people behind CommonMark itself, or thereabout, you most likely will be writing Markdown with a flavor that is already extended to handle a large number of syntax not currently supported by CommonMark.
Going Forward
CommonMark is a constant work in progress with its spec last updated on April 6, 2019. There are a number of popular applications supporting it in the pool of Markdown tools. With the awareness of CommonMark’s effort towards standardization, I think it is sufficient to conclude that in Markdown’s simplicity, is a lot of work going on behind the scenes and that it is a good thing for the CommonMark effort that the formal specification of GitHub Flavored Markdown is based on the specification.
The move towards the CommonMark standardization effort does not prevent the creation of flavors to extend its supported syntax, and as CommonMark gears up for release 1.0 with issues that must be resolved, there are some interesting resources about the continuous effort that you can use for your perusal.
This content was originally published here.