This repository is for a R
package which contains a series of templates for creating HTML and PDF documents in R markdown using the style themes employed by NZ Department of Conservation (DOC). This document is a brief guide to getting started with writing and producing tables and figures using R markdown followed by directions to installing the DocRR
package and using it to produce high quality reproducible reports from R.
The tabs below take you through each of the key steps from some of the R markdown basics through to producing reproducible reports in web, pdf or word format.
The following is a brief introduction to R markdown, covering some of the basics of creating a .Rmd
document, writing content and producing tables and figures. For more detailed documentation download and use the R markdown cheatsheet and R markdown reference guide. If already familiar with R markdown then skip ahead to the DocRR section to install and begin using the DocRR
package.
Before starting working with R markdown some extra software is needed. In order for PDF output to work you need to make sure you have a LaTeX distribution installed. MiKTEX is the most common distribution to use. Download and install it if you don’t already have it.
R markdown provides a way of analysing data and reporting results in one workflow. You can use a single R markdown file to both
For a simple example of an R markdown file download and open the .Rmd
file from here.
Notice that the file contains three types of content:
---
```
before and after R codeYou can also have in-line R code.
YAML, code chunks and in-line R code are described in more detail below.
At the top of the .Rmd
file you downloaded you can see text in between ---
:
---
title: "Viridis Demo"
output: html_document
---
This is called the YAML header. YAML is a simple text-based format for specifying data. In R markdown documents YAML can be used for specifying the document metadata (title, author, date). This can be done as below:
---
title: "Viridis demo"
author: "Your name"
date: "3 Feb 2017"
output: html_document
---
The final document will then contain a nicely formatted title, along with the author name and date.
You can leave off the author and date if you want; you can leave off the title, too. Actually, you don’t need to include any of this. But output: html_document
tells the rmarkdown
package to convert the document to html. Similarly output: pdf_document
or even output: word_document
, in which case your document will be converted to a PDF or Word .docx
file, respectively.
Have a play with changing the YAML output from output: html_document
to output: pdf_document
or output: word_document
and click on the ‘Knit’ button to see how easy it is to generate documents in different formats.
Figure 1: The ‘Knit’ button in R Studio.
YAML becomes more important later when we want to convert between document formats and customise our documents.
A key motivation for using R markdown is facilitating reproducible research and reporting: that our results are accompanied by the data and code needed to produce them.
Code chunks allow you to incorporate your analysis and output in R directly into your reporting.
Code chunks look like this:
{r makesomedata}
x <- rnorm(100)
y <- 2*x + rnorm(100)
When you process the R markdown document, each of the code chunks will be evaluated, and then the code and/or output will be inserted. If the code produces a figure, that figure will be inserted.
It’s important to give each code chunk a name, like rat_graph
and RTC_table
. Each code chunk needs a unique name. The advantage of giving each chunk a name is that it will be easier to understand where to look for errors, should they occur. Also, any figures or tables that are created will be given names based on the name of the code chunk that produced them.
An R markdown document will have often have many code chunks. They are evaluated in order and the outputs of a code chunk can be used in other parts of your reporting.
The other way of incorporating R code directly into your reporting is through ‘in-line’ code. This is not an alternative to using code chunks. The 2 methods serve different purposes. For instance you would not produce a graph or figure using in-line code. This is done through using code chunks as described above.
In-line code is used for incorporating numbers derived from data directly into your writing. Thus, your report should never explicitly include numbers that are derived from the data. Don’t write “There are 168 individuals.” Rather, insert a bit of code that, when evaluated, gives the number of individuals.
In R markdown, in-line code is indicated with `r
and `
. The bit of R code between them is evaluated and the result inserted.
Code results can be inserted directly into the text of a .Rmd
file by enclosing the code with:
`r code_here`
To write “there are 168 individuals” using in-line code you would do this:
There are `r nrow(my_data)` individuals.
Another example:
The estimated correlation between x and y was `r cor(x,y)`.
R markdown will display the results of in-line code, but not the code itself. In-line code is used to insert information in your document where needed to enable easy updating of the document when new data becomes available.
Markdown is a simple way to format text in a document which can be easily converted to HTML (web), pdf or word formats. The following section covers the basics of writing text and producing tables and figures.
Writing plain text is as simple as writing as in Microsoft word, you are simply incorporating your formatting into your text.
For example:
To make text bold use 2 **
either side of your text (**bold**
).
To make italics use ’_’ either side of the text you want _italicized_
.
See the R markdown cheatsheet for instructions for other aspects of writing such as creating bullet points etc.
Making and displaying tables in R markdown is really easy. If wanting to display data in a table we can use the kable
function from the knitr
package inside a code chunk like this:
_Insert image
This outputs like this:
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
kable
works really well for simple tables, other packages such as xtable are better for more complex tables.
If displaying solely in HTML the DT
library has some good options for displaying more complex data. It has some nice interactive features.
require(DT)
datatable(iris)
You’ll see that DT
tables have more options for filtering and searching data. This can be helpful if you have a more complicated table you want to show. DT
tables don’t work if you want a PDF report.
You can also generate a table in straight markdown.
This looks like this:
| Sepal.Length [^1] | Sepal.Width | Petal.Length | Petal.Width | Species |
|--------------|-------------|--------------|-------------|---------|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
Table: (\#tab:markdowntable) A table about some sort of flower
[^1]: A footnote about Sepal length
But outputs like this:
Sepal.Length1 | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | setosa |
4.9 | 3.0 | 1.4 | 0.2 | setosa |
4.7 | 3.2 | 1.3 | 0.2 | setosa |
4.6 | 3.1 | 1.5 | 0.2 | setosa |
5.0 | 3.6 | 1.4 | 0.2 | setosa |
5.4 | 3.9 | 1.7 | 0.4 | setosa |
4.6 | 3.4 | 1.4 | 0.3 | setosa |
5.0 | 3.4 | 1.5 | 0.2 | setosa |
4.4 | 2.9 | 1.4 | 0.2 | setosa |
4.9 | 3.1 | 1.5 | 0.1 | setosa |
Markdown tables can be tricky to manually produce. It is easier to use a webtool like http://www.tablesgenerator.com/ to copy a table from word/excel, paste into the webtool and convert to markdon format.
The advantage of using markdown format is that it gives more flexibility for situations where you may need to to more editing inside a table, such as adding footnotes which you would do by inserting [^1]
where needed followed by [^1]: A footnote
below the table as shown above.
To display a graph or figure in Microsoft Word or similar you generally have to copy and paste them manually into your document. This lends itself to making mistakes and becomes tedious when your data/presentation changes.
Using R markdown and R and it is really easy to place your figure in your work exactly where you want it to go. If your data changes then it is as simple as ensuring your new data is called to update your figure. Figure 2 tells us simple things about the iris
dataset.
plot(iris)
Figure 2: Something to do with flowers
R markdown will automatically produce a bibliography from in-line citations.
Figure 3: The ‘Cite’ dialog box in Google Scholar.
You can now refer to this within your document.
Citations go inside square brackets and are separated by semicolons. Each citation must have a key, composed of @
+ the citation identifier from the database.
to add a citation at the end of a sentence:
Blah blah [@doe99]
To refer to one or more citations use ;
:
Blah blah [@smith04; @doe99].
To do an in-line citation do the following:
@smith04 says blah.
To add additional information to your citation such as page number use square brackets following the citation:
@smith04 [p. 33] says blah.
When you cite something using the @
symbol and the citation identifier i.e. @smith04
it will automatically appear in the document bibliography. The bibliography will automatically appear at the end of the document. If you finish each document with ##References
then the bibliography will appear below the header.
It is possible to create custom made templates for PDF and HTML documents in R markdown. Using a custom made template means the author can produce a fully reproducible document using the correct styling for your organisation.
The DocRR
package contains templates for reproducible reporting for the New Zealand Department of Conservation (DOC). The following sections will get you started with the package and walk you through producing PDF and HTML documents from R markdown.
DocRR
stands for ‘DOC Reproducible Reporting’ as the DocRR
package has been made to enable DOC staff to produce high quality documents in PDF and HTML which can be easily updated and reproduced. Using DocRR
templates means there is no need to bother with manual formatting of Microsoft Word documents or copying and pasting of graphs and figures.
To use the `DocRR
package you will need a recent installation of R. RStudio makes the DocRR
package alot easier to use.
First off open R/RStudio and install the devtools
package if you haven’t already.
Paste the code into your R console and run it:
install.packages("devtools")
Now install the DocRR
package using:
devtools::install_github('ogansell/DocRR')
R Studio supports these templates natively.
You can now load pre-made templates for outputting analysis and reporting in HTML or PDF format. To do this go to File>New File>RMarkdown (Figure 4).
Figure 4: Opening a new rmarkdown file within RStudio.
Go to RMarkdown>From Template and select one of the templates under DocRR
.
Select the template you want to load, give your file a name and save it (Figure 5).
Figure 5: The dialog box for saving a template.
Important to note here is that when you save your file a folder structure for your file is automatically made. This is because the templates have some extra files they are dependant on in order to compile (Figure 6).
Figure 6: The folder structure automatically created when a template is loaded.
You’ve now got a template for producing a PDF or HTML report to begin working on (Figure 7). Press the knit button to have a look at what it produces.
Figure 7: A loaded rmarkdown template
When making PDF or HTML documents a common intermediate step is to knit
your document to .docx
format. This makes it easy to send to reviewers to get their input. The webshot package is useful in this instance as it includes a screenshot of any html widgets used in the knitted .docx
file. It requires installation of the PhantomJS software. This causes installation of the DocRR
package to fail on older installations of R so it is recommended to install this manually after installing DocRR
. Do this by running the code below in your R console:
webshot::install_phantomjs()
If you’re not using R Studio it’s still easy to use these templates in base R.
Wrapper functions have been made to make it easy to load a template.
There is one function for each template. The document names are examples only, you can name the .Rmd
file whatever you like.
Functions below:
DocRR::article("myarticle.Rmd")
DocRR::article_book("myarticle.Rmd")
DocRR::report("myreport.Rmd")
DocRR::report_book("myreport.Rmd")
DocRR::plain_html("myplainwebpage.Rmd")
DocRR::tab_html("mytabbedwebpage.Rmd")
DocRR::indicator_html("myindicatorfactsheet.Rmd")
To compile your document use:
rmarkdown::render("my_article/my_article.Rmd")
Remember that you need to point it to the right directory, which means you need to include the folder which was automatically created to host your .Rmd
file.
You can now start making documents using these templates. It’s important to remember that each template has been designed for a specific output (i.e. PDF articles or reports, plain webpages, web pages with tabbed layout or standardised pages for reporting on biodiversity indicators). Therefore the docarticle
or docreport
templates won’t work for making webpages and HTML templates won’t work for generating PDF documents. Saying that, all of the templates will output to .docx
(word format) provided some of the YAML metadata is altered (we’ll cover this later). This is useful for giving reviewers who aren’t familiar with R or R markdown a version of your document for editing and making comments.
You’ll see there are 2 versions of the article and report PDF templates (i.e. docarticle
and docarticle_bookdown
). The bookdown
package adds the extra utility of cross referencing tables and figures in text. This is especially handy for scientific/technical publications with lots of tables and graphs, but is also useful for images etc.
Using the bookdown
templates seems to cause some problems when knitting documents on some computers, so article and report templates have been included which do not use the bookdown
package. Guidance on how to cross-reference using bookdown
has been provided in each of the templates.To learn more about bookdown
go to https://bookdown.org/yihui/bookdown/.
The templates in this package have all of the style elements needed to adjust the colour of banners or fonts according to the DOC style guidelines. There are 7 different colours to use for banners at the top of the first page and for text headings.
Changing colours in PDF or HTML documents has been made easy by defining functions in R to allow the user to easily choose whatever colour is required for banners and fonts. The colours for banners and fonts are shown in the table below. Beside each colour you can see the function for the banners for PDF articles and reports and for HTML documents. There is also a function for specifying the colour of headings for HTML documents. Changing the colour for your document us covered under the following sections on producing PDF or HTML documents.
Colours | Functions | |||
---|---|---|---|---|
Banners | Font | |||
PDF Article | PDF Report | Web banner | Web font | |
Alpine![]() |
DocRR::Alpine
|
DocRR::Alpine_report
|
DocRR::Alpine_web
|
DocRR::font_Alpine
|
Lake![]() |
DocRR::Lake
|
DocRR::Lake_report
|
DocRR::Lake_web
|
DocRR::font_Lake
|
Sunset![]() |
DocRR::Sunset
|
DocRR::Sunset_report
|
DocRR::Sunset_web
|
DocRR::font_Sunset
|
Fauna![]() |
DocRR::Fauna
|
DocRR::Fauna_report
|
DocRR::Fauna_web
|
DocRR::font_Fauna
|
Kiwi![]() |
DocRR::Kiwi
|
DocRR::Kiwi_report
|
DocRR::Kiwi_web
|
DocRR::font_Kiwi
|
Honey![]() |
DocRR::Honey
|
DocRR::Honey_report
|
DocRR::Honey_web
|
DocRR::font_Honey
|
Greyscale![]() |
DocRR::Greyscale
|
DocRR::Greyscale_report
|
DocRR::Greyscale_web
|
DocRR::font_Greyscale
|
Producing a PDF document from R markdown requires LaTeX be installed on your computer. As above, we recommend MiKTEX. Knitting to PDF can be a bit tricky at first. Behind the scenes a programme called pandoc
is converting your .Rmd
document into a LaTeX document. LaTeX requires packages of it’s own to do various tasks involved in producing the document. To get started you will need to ensure that MiKTEX can access it’s package repository through the DOC proxy server.
If you’ve got MiKTEX installed search for and open the MiKTEX Package Manager
in your program files (Figure 8).
Figure 8: The MiKTEX Package Manager dialog box.
Click on Repository
then Change Package Repository
(Figure 9).
Figure 9: The MiKTEX Change Package Repository dialog box.
Now click on Connection Settings
(Figure 10),
Add a tick to ‘Use a proxy server’ and add webproxy
to the Address:
box and 8080
to the Port:
box. If you’re working off the DOC system then you’ll need to untick ’Use a proxy server` for everything to work.
Figure 10: The MiKTEX Connection Settings dialog box.
Now you’re ready to start working on a document templates. Follow the instructions under the DocRR package section to open one of the PDF templates docarticle
or docreport
or one of the bookdown
versions.
Figure 11 shows one of the templates to produce a PDF document. All of the text between the ---
lines is known as YAML. YAML was mentioned previously and is essentially the document metadata.
Figure 11: YAML heading in ‘docarticle’ PDF template
In the YAML section you can see some notation such as \LARGE
and \large
. This is LaTeX code to specify the size of the title and subtitle. You can modify the titles and subtitles inserted there provided the \LARGE
is at the front with a space between it and the actual title. Figure 12 shows the LaTeX commands for different font sizes. These can be used for changing font sizes for the title and subtitle in the YAML section.
Figure 12: LaTeX commands for changing font size.
Now we can specify the colour of the banner and font headers for the document. Lake
has been set as the default colour. Where we see this in the YAML header:
#This section contains raw LATEX code and shouldn't be altered
header-includes:
#--------------------Define color for use here-----------------#
- \newcommand{\mycolor}{\color{Lake}} #Define color for use here
- \ThisULCornerWallPaper{1}{{r DocRR::Lake()}}
#-----------------------END-------------------------------------#
We can see two LaTeX commands defined.
This command controls the colour of the headings:
\newcommand{\mycolor}{\color{Lake}}
This command controls the colour of the banner.:
\ThisULCornerWallPaper{1}{{`r DocRR::Lake()`}}
To change your colours simply add a different colour at:
\newcommand{\mycolor}{\color{----}} and \ThisULCornerWallPaper{1}{{`r DocRR::----()`}}
from one of the colour options in Table 2. Make sure not to change any of the other text otherwise the functions will not run successfully.
If you have a really long title or subtitle it can push the location of the author and date down the page. The best approach to manage this is to change the font size of your title and subtitle as needed to control how it influences the overall layout.
To create your PDF press the ‘knit’ button in R Studio or use the render
function in R. Behind the scenes your .Rmd
document in converted to a formatted PDF document.
The DocRR
package contains three HTML document templates. All of them contain a little bit of HTML and CSS, however you don’t need to know much about this in order to use these templates. Figure 13 shows one of the html templates from the DocRR
package.
Figure 13: Example DocRR
HTML template.
You can see that there is some text in between the <
and >
symbols. This is HTML code and the < >
symbols are known as tags. < >
opens a tag and </>
closes a tag. So where you see <h1>Title of page</h1>
is where the title of the page should go because h1
stands for the highest order in a series of headers. So if you want to change the title for this document simply put what ever text you want in between the < >
tags <h1>A different title</h1>
. Provided the <h1>
and </h1>
are left unchanged everything should work fine. HTML code has been used in this instance as it works better with the design of the templates and the CSS developed to go along with it.
CSS controls the style elements for the HTML document. If you know CSS you could change the look and feel of the font or table layouts for the document. However the DocRR
templates have tried to minimise the amount of HTML or CSS the user needs to learn. In the section below the header you can see the code:
<!--Add some custom css to control font and table header colour or other style components where needed-->
<style>
h2 {color: `r DocRR::font_Lake()`}
h3 {color: `r DocRR::font_Lake()`}
h4 {color: `r DocRR::font_Lake()`}
th {background-color : `r DocRR::font_Lake()`}
</style>
This sets the color for the font for each level of section header and background for any table headers. In HTML code this is shown by h2 h3 h4 th
and translates to # ## ### ####
in markdown
text. The colour is set using CSS which is set in between the { }
after each section header i.e. h2
. Normally you would need to know different HEX codes for each colour which would look like #FFFF00
. To make things easier wrapper functions have been defined to make it easy to choose different colours for the headings of HTML documents. So if you want to change your h2
headers from Lake to Alpine simply refer to the functions from change Table 2 and change the name of the colour in the function
`r DocRR::font_Lake()`
to
`r DocRR::font_Alpine()`
To change the colour of the banner use the functions defined for changing the colour of web banners (Table 2).
There are no word templates established for this package, namely because working in Word doesn’t lend itself to a reproducible workflow. However Word (.docx
) output is useful for sharing your work with others for the purposes of peer review and comment as many people are not familiar with working in rmarkdown
.
Word versions of PDF or HTML templates can be easily produced using rmarkdown
. They do not have any style elements applied and should be used for getting feedback and review from others. Once you have all the feedback you need any changes should be incorporated back into your ".Rmd"
document.
In order to produce `.docx
documents from any of the DocRR
templates you need to remove some #
from some of the YAML header and add #
to other section of the YAML header. The #
symbol tells R to ignore any content beginning in or following #
.
Below is the YAML header for a PDF document:
#------------If wanting word or html versions uncomment either one below--------#
# word_document: default
# html_document: default
#-------Make sure to comment out everything below if using word or html above-----#
pdf_document:
keep_tex: false #Only have true if you want to keep the latex file produced
latex_engine: xelatex #Has to be xelatex to use Archer fonts
template: doc_article.tex #Don't change this!!!
toc: false #include table of contents
toc_depth: 1 #specify the depth of the table of contents
number_sections: true #specify whether or not to add numbered sections
This ensures the .Rmd
file creates a PDF document. If you want to produce a .docx
document you need to remove the #
symbol from word_document
and add #
symbols to every line from pdf_document
and down. So that section of the YAML would now look like this:
#------------If wanting word or html versions uncomment either one below--------#
word_document: default
# html_document: default
#-------Make sure to comment out everything below if using word or html above-----#
#pdf_document:
#keep_tex: false #Only have true if you want to keep the latex file produced
#latex_engine: xelatex #Has to be xelatex to use Archer fonts
#template: doc_article.tex #Don't change this!!!
#toc: false #include table of contents
#toc_depth: 1 #specify the depth of the table of contents
#number_sections: true #specify whether or not to add numbered sections
This will now knit a .docx
document. If you want to go back to PDF output you simply reverse the two steps just described.
A footnote about Sepal length↩