1 Background
1.1 What is R?
R is many things to many people, but for now, let’s focus on two aspects:
- R is an open-source program intended for data analysis and visualization.
- R is a programming language for automating analysis and implementing new methods.
This is often summarized as R is a language and environment for statistical computing and graphics.
As a programming language, R is
- interpreted,
- functional,
- object-oriented.
Basically, interpreted means that R commands and scripts are run within the R system, instead of being compiled to run as stand-alone executables, like programs written in C or C++. Functional means that anything interesting in R is done by calling a function, which is a slightly more general concept than SAS procedures or Stata commands. Object-oriented finally we take to mean that data and results in R can be stored as objects for further processing and inspection, making it easy e.g. work with several data files at the same time, or to inspect the results of several analyses at the same time. We will use these three properties as hooks when we start working with the R command line.1
R-the-software is the product of a collective of statisticians and programmers2 and a large community of open-source contributors. R has been around for more than two decades, with the official launch of version 1.0 in spring 2000; it has become a hugely successful and popular platform for the intended purposes, and the ongoing method development has led to the availability of tens of thousands of add-on packages implementing an incredible range of different methods (admittedly also at a wide range of different quality levels).
In this document, the focus is on the use of R and a selected set of add-on packages for the purpose of data analysis and visualization in an epidemiological or biomedical research setting.
1.2 What is RStudio?
For our purposes, RStudio is an integrated development environment (IDE) for R, i.e. a program that lies on top of the R program and provides an enhanced graphical user interface (GUI) for running analyses and writing code.
As an IDE, RStudio integrates elements like plots, help information, access to the file system etc. into a consistent GUI that has the same functionality and appearance across different underlying platforms (like Windows, Linux etc.). It contains a powerful integrated code editor and strong support for code development, e.g. version control. Compared to the barebones R software on its own, this offers a much friendlier environment for beginners. However, RStudio does not provide a GUI for actual statistical analysis, which is still performed at the R command line using the R language. This is a good thing.
RStudio is a distributed by commercial actor (Posit Software, PBC3) that provides both freely available open-source software and commercially licensed variants. This company is also the professional home of some of the most productive R developers of the current generation, who have have collectively contributed hundreds of powerful and popular add-on packages for R.
Generally speaking, the focus of RStudio is data science rather than plain statistics and data analysis: while there is of course a huge overlap, there is a corresponding emphasis on code development, interactivity, dashboards etc. which is somewhat less relevant in research (IMO).
In this document, we use RStudio as the main interface to R for most examples. Importantly, all functionality is also available in base R, using either the barebones R interface or some other IDE (Emacs etc.), though not always as conveniently.
1.3 Software installation
R is open source under the GNU General Public License and available from https://cran.r-project.org/ for a range of different operating systems.
An open source version of the RStudio Desktop software is distributed under the Affero General Public License at https://posit.co/downloads/.
For the purpose of this introduction, a standard installation, by downloading the respective installer for your system and running it with the proposed deafault settings, will be sufficient.
If you have a programming or computer science background, you may be used to somewhat different and more rigorous definitions as e.g. seen in https://en.wikipedia.org/wiki/Functional_programming and https://en.wikipedia.org/wiki/Object-oriented_programming. And, yes, R is both functional and object-oriented in that sense, too, e.g. accepting functions as arguments and return values of functions, and supporting class-specific methods.↩︎
The company that has developed RStudio and owns the rights to the name is Posit Software, PBC, a Delaware public benefit company https://posit.co/. Until fall 2022 however, the company was called RStudio, PBC, leading to confusion about RStudio-the-software and RStudio-the-company, so however you feel about the new name, it removes that confusion at least.↩︎