πŸ“œ Markdown as a Language

March 12, 2017  Some scattered thoughts on the past, present and future of markdown.

In early Janurary, I spent several hours writing down some my scattered thoughts and opinions on the future of the plain text writing format, Markdown. I have come back to this post several times since then, and never been quite ready to publish it, but now I have a seperate blog from the base site, it felt fitting.

Anyway, while reviewing these jottings, the consistent theme was frustration. I actually was a little suprised at this - I do write most of my notes and blog posts in markdown - and spend way too much of my time trying to persuade people around me to use it… Maybe I thought I liked it more than I do?

Then it hit me, I wasn’t frustrated with Markdown. I was frustrated with how limiting I found the code editors (Atom, Sublime, Notepad) and Markdown tools (Marked, Macdown, Typora). None of these tools manage to account for one important fact: Markdown is not just another typesetting format, Markdown is a dynamic and evolving language system. A spoiler here, I now use a mix of Emacs (Spacemacs), Atom, and Microsoft Code to achieve my writing. Despite all these, systems the thing I struggle to capture is that the Markdown syntax itself changes based on a user’s personal and interpersonal communicative needs.


A quick Google revealed this was far from a totally original original idea. A couple of years ago, a web standards consultant, Tobie Langel, tweeted the following request for Markdown to standardise itself:

John Gruber (the lead developer of Markdown itself) responded directly with a quite interesting tweet:

What is particularly interesting about this tweet is that John recognises there is no one “true” Markdown syntax. You would be hard-pressed to get a statement like this from any other lead developer of a programming language. Even the Internet Engineering Task Force defined Mardown as “a family of related plain-text formatting syntaxes and implementations that, while broadly compatible with humans, are intended to produce different kinds of outputs”. I mean, how much more vague can you get? So, Markdown is just a human readable, plain text writing system - with many variants? It sounds more like a cryptic crossword clue than a description of a format.


It doesn’t seem like it’s going to be easy to actually define how Markdown would differ from any other “human readable plain-text writing system”, such as writing your notes on a peice of paper. The closest thing to a defenition I managed to get is based on what tools are out there can parse Markdown (e.g. Pandoc, Github, or Multimarkdown). This really isn’t that useful - it’s a recursive definition. If I write my own parser for a Markdown format, does that make it “real”?

This thought made me realise this was the same issue in defining what a language is in natural language. If you make a language system only you can understand, is it a langugage? We have great tools for writing in natural language. So, if we want to make plain-text writing more universally accessible and improve the editors - we must be able to account for (and predict) these kinds of usages of the system.

Comparing Markdown with other Language Systems

Looking to other programming languages for an answer is also not useful. In almost all programming languages, difference in syntactic structure is either completely arbitrary and has no direct impact on the interpretation of a statement (do you place { before or after the declaration?) or it signifies an entirely different function (indentation styles in Python). In Markdown however, users can add new structures, or combine existing ones in order to change our interprettation of the text itself (e.g. placing text next to a custom made [[idea: description]]) template.

The difficulty in defining the syntatic structure of Markdown is closely linked to research I conducted on communicatio systems in my honours year. A key challenge in the Language Evolution research field is trying to explain how language syntax arises, and why it changes and evolves over time.

In order to focus on how we can use findings from Language Research to better improve Markdown writing tools. I’ll briefly explain two important concepts in language research

Interaction

Interaction plays an important role in language change. When we interact with other individuals through language, we must mutually agree on what a sign represents. This means that through interaction, sub-populations of humans will align on a shared way to communicate a signal. Take for instance the English language. There are many variants of it (American, Canadian, Australian, British, Indian - to name a few) which differ in many ways. Whilst we broadly agree on major the interrpretation of most signs; there are key differences in the languages as well.

Environment and Bottom-Up Driven Change

The bottom-up account of language evolution argues that structured communication rapidly emerges due to environmental conditions. Communication systems are shaped and evolve by interacting agents adapting their systems to the communicative needs presented by the environment (Steels, 2006).

Evolution of Markdown

To examine how Markdown is similar to natural language, consider each Markdown user to be an agent. Each agent has a primary flavour of the language they use (Github, Pandoc, Academic), or perhaps a combination. They may also add their own features into the language. When a group of agents form a community (often online), we see the Markdown syntax begin to standardise - normally for the purpose of being able to parse the syntax to some kind of output. Whilst communities may differ in exact standards, there are overall consistencies in the form - **Bold** will always be bold.

We can think of the environment of Markdown users as a combination of the tools they are using and the content/subject matter they are writing about. My environment consists the following features: I primarily use Pandoc for processing, I write in the Atom editor, and need to use several academic features such as citations. However, I also write a lot of notes in my own flavour of Markdown which I don’t intend to send to a processor (though I would love for them to be). If I have a question during a talk I write it as follows - [?name] and when I have an idea when I am writing I use -- inline idea -β€”. If you read these notes, the format of them is inherently meaningful, but is not standardised Markdown syntax.

Interesting Question: If I wrote a parser for my variant of Markdown, is it now a variant or just my stitched together notes?

Conclusions for Future Markdown Development