Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
R Markdown: The Definitive Guide (bookdown.org)
124 points by Koshkin on Feb 3, 2021 | hide | past | favorite | 58 comments


As an occasional R user, I think R markdown one of the things that R does really well. For data scientists who want to output reports (mix of text and calculations), I haven't come across anything as mature or easy to use in the Python ecosystem. I'm a big user of Jupyter notebooks, but version control issues put me off, and I've never got to grips customising the formatting of jupyter nbconvert.

It's worth highlighting that R Markdown actually supports a number of languages including Python - for instance see here: https://bookdown.org/yihui/rmarkdown/language-engines.html

For example, I am currently considering using R markdown to author the documentation for one of my Python packages. Currently my documentation is written in markdown, but I keep having to copy and paste calculations and tables into the markdown. When things change, it would be nice to just be able to re-run the R markdown, converting Rmd -> md and thereby interpolating the latest version of all calculations into the final markdown doc.


For the Python in markdown case, you might be interested in one of my projects that allows executable Python code (including optional Jupyter kernel support) in Pandoc markdown: https://github.com/gpoore/codebraid. Pandoc does all the document parsing (there is no regex preprocessor for extracting code), so converting markdown to markdown often works particularly well.


Hydrogen for Atom actually can have executable python in markdown, and more - write python in .py files and execute strings or define blocks by "# %%". It's not the same of course but for my usecases it's more convenient (given that it works in a proper code-editor with all needed plugins).


I'd strongly agree with this. I use R Markdown with Python in the Rstudio IDE, and prefer it to Jupyter. The publishing pipeline is really flexible, e.g. with one-click deployment to RStudio Server [0] (which is a paid product).

No affiliation to RStudio, just a happy customer.

[0]: https://rstudio.com/products/rstudio/download-server/


Currently working on a DA masters with a background in markdown and was so happy to discover this.

Then I was baffled to discover that there wasn’t an equivalent for Python. Jupyter is great, but I really love the simplicity of my text files and I’m surprised that there isn’t a major implementation given how popular both markdown and Python are. I guess notebooks are just too appealing.


Honestly jank and bad ergonomics pervade the Python data science ecosystem. The pandas API is kind of bad (multi-indexes anyone?) matplotlib is way too low level, tensorflow is a shitshow. I think the R community got lucky with a greater focus on user experience, even if it comes at the cost of explicitness sometimes (lots of DSL's with a lot of 'magic').


Honestly, in my experience, R's DSLs are not a bad thing. When you are doing analytics day in and out you basically are speaking the DSL anyway, so what if you have to learn this slightly different language if it expresses what you want to accomplish clearly?


it was pretty wild to hear Hadley Wickham invoke Cognitive Behavioral Therapy in his most recent rstudio::conf keynote - dude is super in tune with how humans use open source data analysis tools


R Markdown/Notebooks, IMO, has evolved to be the key value proposition of RStudio over the years. From a productivity standpoint when both working with R and publishing pretty reports/PDFs it has been incredible. (that said, if VS Code gets more robust R Markdown support, I may consider switching from RStudio)

I wrote a blog post a few years ago comparing R Markdown/Notebooks to Jupyter Notebooks: https://minimaxir.com/2017/06/r-notebooks/


Funny thing is this has actually motivated me to move away from using rmarkdown. To the point that I went "backwards" and wrote several reports and an academic paper in Sweave after using rmarkdown for years. My primary motivation was that I needed to know the documents could be built by any R session, not just from within RStudio. I didn't want to be in a situation where my "reproducible" document relied on a specific IDE to build correctly.


Hi, I’m with RStudio.

We generally treat “working in RStudio but not elsewhere” as serious bugs, especially in our open source R packages. Not only because we don’t want to artificially limit usage of other interactive environments, but because a lot of what people do with R is run their code in non-interactive settings like CI/CD pipelines or ETL jobs.

(Actually, an R Markdown compilation is a particularly good example of something people often do from Travis or GitHub Actions)


I actually use Emacs to author R Markdown and the exporting works pretty flawlessly.

Before anyone asks, I use Org Mode. I just don't feel like pushing that format on my co-workers. If it's for me only, I use Org. If it's for a collaboration or export into a report, it goes into an R Markdown file.


Curious: how does Org-Mode compare with RMarkdown's use case? I'm only tangentially familiar with both. I've had the hardest time getting into Emacs (Doom-Emacs), and while Org-Mode seems great, the benefits seem to come entirely from Emacs, not the syntax.

I used markdown + pandoc to write my PhD thesis, but was forced to resort to latex for any formatting that was even slightly complicated (ie, multifigure plots, tables, etc).


I have created portable (for Windows) versions of R and R-Studio that work fine for the most part. However I was unable to make Bookdown work on them.

An official portable version of R-Studio: one that lives off a folder in the computer, and can be moved to a new PC in a pinch with all settings and plugins intact, and works with all official packages... would be very sweet. (Reason: upgrading from one PC to a newer one is a pain if one has to install application software from scratch. Copying over the "portableApps" folder is so much easier.

Portable R available via PortableApps here: https://portableapps.com/node/32898

Portable RStudio via instructions here: https://rpubs.com/jsmccid/rportable


I'm all for finding alternatives to RStudio's dominance in the R space. No doubt they do some excellent things, but I also have a vague uneasiness at the cultishness surrounding some of their products. It sometimes feels like if you're not using the 'tidyverse', you're viewed as doing it wrong.

Anyway, this feeling set me off on an exploration of alternative tools and packages. Instead of rmarkdown I'm now exploring the pander[1] package, which seems to do most of what I'm looking for, perhaps only a little limited in output formats.

Edit: The Pandoc.brew examples might be most interesting from a direct alternative to rmarkdown for document creation context.[2]

[1] http://rapporter.github.io/pander/

[2] http://rapporter.github.io/pander/#examples


For me part of the cultish ness is justified because I have to teach about 100 non programmers to use R and rmd every year. Tidyverse solves the big problem that R used to have was that there were hundreds of ways to do things, none of the functions had consistent naming or calling signatures and it was hard to Google sensible answers. Tidyvers is fast, consistent and has good docs. Pipes discourage lots of mutable state which is a major cause of errors in non programmers code. I think if you are going to be sharing your code with other researchers and are not using tidyverse then These days I think you basically are doing it wrong.


Hmmm... I've always found tideverse syntax inelegant. Data.table syntax does more with fewer different words. But it can get a bit convoluted. Any ways: there are different ways to solveing problems and writing code. Should we not encourage this? Otherwise it's just cargo cult again and again.


To a point. I’m actually a data table user too, but it’s actually a pain at the moment (and a good example of why tidyverse quality matters) because they broke the integration with reshape and made me rewrite a bunch of guides a while ago. I don’t like to encourage cargo cutting, but to some extent it’s needed as a beginner. Heck, we’re all still cargo culting to some degree unless you also understand the machine code.


I teach a graduate course on R every year. One of my points on the Tidyverse is that it does an excellent job of providing clean and sound extensions to core language functionality. I emphasize how thoughtfully designed it is. It is a refreshing change from other languages, which feel more like a collection of random parts.


To be honest, I can feel the "cultishness" you mentioned, but I'm curious if you also feel that for R Markdown products (which are mostly irrelevant to Tidyverse). If you do, I'd love to try my best to fix that, because that's something that I personally don't like. I want to make it clear that if you don't use R Markdown, you are definitely not doing anything wrong, e.g., LaTeX and HTML are totally legitimate and supported:

https://bookdown.org/yihui/rmarkdown-cookbook/latex-hardcore...

https://bookdown.org/yihui/rmarkdown-cookbook/html-hardcore....


As someone who learnt R before tidyverse was a thing, I would have quit R a long time ago without the tidyverse.

Base R is that much of a pain.


I also learnt R before tidyverse was a thing (2007), and eventually abandoned it for Python & Pandas a year before tidyverse came out. I sorely missed ggplot, but I couldn't justify the ugliness of R to make up for it. Now when I look at R code, it's almost always tidyverse, and makes little sense to me.

I'll probably move to Julia at some point.


Agreed. Tidyverse is a totally different language, even. And it's one that is consistent, easy to use and most importantly, it feels designed

Huge props to Hadley Wickham and whoever helps bring programming niceties to R's otherwise mostly non-programmer userbase


I'm one of the main developers of R Markdown. As @jcheng said, it is definitely not our intention to lock you in RStudio for any of our R packages. You probably can't imagine how hard I have been trying to avoid relying on RStudio specifically for certain features. There are decisions that I can definitely make in favor of the RStudio IDE, but usually I don't do that, and hope things also work well with other editors. Pretty much everything you do in the RStudio IDE for R Markdown documents or projects can be done in command line. It's not possible that a document is no longer reproducible just because you stop using RStudio.


Which parts don't work outside of R Studio for you? Is it something recent? I usually create my single-page and bookdown documents as one step in a separate R script, and haven't found any problems beyond the rmarkdown package's poor documentation (I really wish the R Studio developers would write in-package documentation with as much care as with their online documentation).


If I recall correctly, I had issues primarily with more complicated documents, and mostly due to some RStudio "project" and file path configuration issues - I would have to look to see what the specific issues were. Simple, one-file .Rmd docs are typically fine either way.


It's pretty easy to compile the documents; it's all just R functions (rmarkdown is a wrapper around knitr and pandoc). I've been writing Rmarkdown docs for many years in Emacs without hiccups (or without hiccups from rmarkdown::render, anyway).


I've been using Rmarkdown and knitr for a long while, and have watched its evolution over the years (roughly 8 years now). As someone who does not use RStudio, it's become a bit of a pain for me to use. The authors seem to expect it is being used from RStudio, and using it in a different environment has become a bit fragile.

It's also a bit telling that this "definitive guide" does not include any troubleshooting/debugging sections - the expectation seems to be that it "just works" so long as you use it in RStudio, but otherwise you are on your own.

Not sure that I am aware of many other R packages with this mentality, but I am personally not a fan.


I write my blog mostly in Rmarkdown from Emacs using ESS/polymode. Mostly works well accept for some markdown-mode induced bugginess.

I'm using it occasionally at work as well, but I still prefer R scripts with nice comments for work.


I haven't used it for a few years, but I have previously set up some purely command line tooling for this (autogenerated verification reports) which worked well. Is this getting harder to do?


Some functionality has been tied to RStudio's concept of a "project" (https://support.rstudio.com/hc/en-us/articles/200526207-Usin...). I have had several documents authored by others which would not build on my system without some intervention due to this.


You can avoid project related issues by using the here package.

https://github.com/jennybc/here_here


Interesting. Thanks.


Using RMarkdown effectively pretty much tripled my salary. It's an amazing tool that is wholly underappreciated in the "data" space.


Could you expand on what you mean by using it effectively? I've just started using RStudio and any tips/ways to use it more efficiently are welcome! Thanks.


Find a business problem in your job that requires information to get into the hands of many people but require tweaks between them. i.e. person X needs to know A but person Y needs to know B.

Read up the section on parameterized reports and realize you can do the work of dozens .

https://bookdown.org/yihui/rmarkdown/parameterized-reports.h...

rmarkdown is insanely effective at getting information out of your hands and into stakeholders repeatedly and in an visually appealing way. When you combine that with your domain knowledge, your value to your company will increase.


> Read up the section on parameterized reports and realize you can do the work of dozens.

This was the killer (visual) feature in LyX when I used it to author LaTeX documents. Great to hear rmarkdown has similar support.


It boggles my mind that in 2021 IPython/Jupyter notebooks _still_ don't have inline code evaluation or sensible parameterization without using extensions.


I could imagine the time invested on making the same report in another language takes 3x.


Something I find that people don't appreciate about better tools is that it's not just about going faster, it's about going farther.

There's a whole bunch of stuff you can do by hand. But if you do it by hand, you'll compromise your sense of satisficing. It amazes me how often the 'work harder' crowd doesn't reconcile the two.

If you can quickly and reliably figure out we're spending $100k a month, you get a raise. If you can figure out why it's $100k instead of $80k, you might be up for a promotion.


I've always been a big fan of R Markdown. For me and my work, there's always been a fine line between R Markdown and R Shiny, with certain limitations (or something I felt was a limitation) in R Markdown could be overcome by building it in R Shiny.


This has been discussed before but I am watching the development of https://github.com/fonsp/Pluto.jl so that it can provide some solid alternative.


If you're interested in Pluto.jl, I recently saw an announcement about Neptune.jl[1], which appears to remove some of the reactive nature of Pluto. It's a fork of Pluto and executes code sequentially one cell at a time instead. An interesting read anyway and maybe worth trying.

[1] https://github.com/compleathorseplayer/Neptune.jl


Huh. I wonder what the impetus for that was and what the purported advantages over Jupyter are, since reactivity seems like the main value add of Pluto.jl - and a significant one at that.


Plug that if you're a Jupyter user to chime in on the survey at https://www.surveymonkey.com/r/LCB7GBF.

Background: https://blog.jupyter.org/survey-jupyterlab-and-beyond-88c7fb...


This is the first HN thread about R where I don’t see a commenter shitposting about how “terrible” R is. Maybe that’s a testament to all the great work done by the great Yihui Xie and the RStudio folks.

Currently using RStudio to write my masters papers in markdown and convert them to PDF. Love it!


Thanks for the kind words (about me and RStudio)!

Well, you might be too optimistic about people not commenting on R. Usually I tend to avoid reading HN, but I feel pretty much every single time when someone brings up anything related to R, some people will start mentioning how terrible R is, sooner or later :) Now people also start fighting around Tidyverse, although I have no idea why that's relevant to R Markdown...


I recently discovered R Markdown. Started doing the ISLR examples with it.

https://github.com/melling/ISLR/blob/main/chapter08/08_Lab02...

https://github.com/melling/ISLR/blob/main/chapter08/08_Lab02...

I need to figure out how to better fit images so I don’t have pages with large gaps

Also, you can now embed executable Python in the files.


Tangentially: Reading [0] from Stephen Wolfram yesterday got me digging into Wolfram notebooks. Does anyone have experience or reflections on using R Notebooks as the main way to keep track of notes?

I don't see the Wolfram products talked about very much here

[0]: https://writings.stephenwolfram.com/2019/02/seeking-the-prod...


Very nice to see this here. R Markdown is a great tool on is own and working with it in RStudio is a superpower, plus the clean format is really nice in version control.


Jupyter Book is more flexible, simpler and tested to death since it is build upon Sphinx and docstrings. Since jupyter uses pandoc for converting notebooks, pandoc is the power tool for publication creations nowadays.


Jupyter has downsides too, e.g. version control. An R Markdown document "in the raw" is just a text file.

For info, Rmd also uses pandoc under the covers. Agree it's the power tool, quite a wonderful thing.


I haven't use it myself but apparently jupytext (https://github.com/mwouts/jupytext) allows you to save Jupyter notebooks as R markdown files.


Jupyter Book (and the larger jupyter ecosystem now) are starting to standardize on MyST notebooks as a version-control friendly all text notebook format. It's similar to r markdown and uses fenced code blocks to represent code cells. You can diff them, edit in any text editor, etc. See: https://myst-nb.readthedocs.io/en/latest/


I'm familiar with both Rmarkdown and Jupyter Book. Rmarkdown also uses pandoc. Both are very flexible.


RMarkdown uses pandoc too, and it's on version 2.6, while Jupyter Book is pre-1.0.


github really needs to add R Markdown Notebook rendering support (that's the ones that end in extension .nb.html)


We have tried for almost four years without success: https://github.com/rstudio/rmarkdown/issues/1020 We did hear back from Github for a few times, but there has been no sign of progress at all. We will truly appreciate it if anyone can connect us to a Github contact to push this forward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: