Description
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read markdown and (subsets of) Textile, reStructuredText, HTML, LaTeX, and DocBook XML; and it can write plain text, markdown, reStructuredText, XHTML, HTML 5, LaTeX (including beamer slide shows), ConTeXt, RTF, DocBook XML, OpenDocument XML, ODT, Word docx, GNU Texinfo, MediaWiki markup, EPUB, Textile, groff man pages, Emacs Org-Mode, AsciiDoc, and Slidy, Slideous, DZSlides, or S5 HTML slide shows. It can also produce PDF output on systems where LaTeX is installed.
Pandoc’s enhanced version of markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, delimited code blocks, superscript, subscript, strikeout, title blocks, automatic tables of contents, embedded LaTeX math, citations, and markdown inside HTML block elements. (These enhancements, described below under Pandoc’s markdown, can be disabled using the --strict
option.)
In contrast to most existing tools for converting markdown to HTML, which use regex substitutions, Pandoc has a modular design: it consists of a set of readers, which parse text in a given format and produce a native representation of the document, and a set of writers, which convert this native representation into a target format. Thus, adding an input or output format requires only adding a reader or writer.
Using pandoc
If no input-file is specified, input is read from stdin. Otherwise, the input-files are concatenated (with a blank line between each) and used as input. Output goes to stdout by default (though output to stdout is disabled for the odt
, docx
, and epub
output formats). For output to a file, use the -o
option:
pandoc -o output.html input.txt
Instead of a file, an absolute URI may be given. In this case pandoc will fetch the content using HTTP:
pandoc -f html -t markdown http://www.fsf.org
If multiple input files are given, pandoc
will concatenate them all (with blank lines between them) before parsing.
The format of the input and output can be specified explicitly using command-line options. The input format can be specified using the -r/--read
or -f/--from
options, the output format using the -w/--write
or -t/--to
options. Thus, to convert hello.txt
from markdown to LaTeX, you could type:
pandoc -f markdown -t latex hello.txt
To convert hello.html
from html to markdown:
pandoc -f html -t markdown hello.html
Supported output formats are listed below under the -t/--to
option. Supported input formats are listed below under the -f/--from
option. Note that the rst
, textile
, latex
, and html
readers are not complete; there are some constructs that they do not parse.
If the input or output format is not specified explicitly, pandoc
will attempt to guess it from the extensions of the input and output filenames. Thus, for example,
pandoc -o hello.tex hello.txt
will convert hello.txt
from markdown to LaTeX. If no output file is specified (so that output goes to stdout), or if the output file’s extension is unknown, the output format will default to HTML. If no input file is specified (so that input comes from stdin), or if the input files’ extensions are unknown, the input format will be assumed to be markdown unless explicitly specified.
Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv
:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
Creating a PDF
Earlier versions of pandoc came with a program, markdown2pdf
, that used pandoc and pdflatex to produce a PDF. This is no longer needed, since pandoc
can now produce pdf
output itself. To produce a PDF, simply specify an output file with a .pdf
extension. Pandoc will create a latex file and use pdflatex (or another engine, see --latex-engine
) to convert it to PDF:
pandoc test.txt -o test.pdf
Production of a PDF requires that a LaTeX engine be installed (see --latex-engine
, below), and assumes that the following LaTeX packages are available: amssymb
, amsmath
, ifxetex
, ifluatex
, listings
(if the --listings
option is used), fancyvrb
, enumerate
, ctable
, url
, graphicx
, hyperref
, ulem
, babel
(if the lang
variable is set), fontspec
(if xelatex
or lualatex
is used as the LaTeX engine), xltxtra
and xunicode
(if xelatex
is used).
hsmarkdown
A user who wants a drop-in replacement for Markdown.pl
may create a symbolic link to the pandoc
executable called hsmarkdown
. When invoked under the name hsmarkdown
, pandoc
will behave as if the --strict
flag had been selected, and no command-line options will be recognized. However, this approach does not work under Cygwin, due to problems with its simulation of symbolic links.