class: title-slide, center, bottom # 00 - First tour of R & RStudio ## Data Science with R · Summer 2021 ### Uli Niemann · Knowledge Management & Discovery Lab #### [https://brain.cs.uni-magdeburg.de/kmd/DataSciR/](https://brain.cs.uni-magdeburg.de/kmd/DataSciR/) .courtesy[📷 Photo courtesy of Ulrich Arendt] --- ## What is `R`? .pull-left70[ - **Statistical programming language** with focus on **reproducible data analysis** - **Free, open source** and **available for every major OS** - Since January 2017 more than **10,000 packages** available - Comprehensive **statistics** and **machine learning** packages - Elaborate packages to create aesthetically appealing **graphics** and **charts** ## What is RStudio? - Open source **Integrated Development Environment (IDE)** for `R` - First release in 2011 - Two major versions: **RStudio Desktop** and **RStudio Server** - Download & Installation of `R` and RStudio: <https://www.rstudio.com/products/rstudio/download/> ] .pull-right30[ <img src="figures//00-R_logo.svg.png" width="100%" /> <img src="figures//00-rstudio_logo.png" width="100%" /> ] --- ## RStudio IDE <img src="figures//00-rstudio_ide_components.png" width="90%" /> --- ## R packages 📦 - Base `R` contains functions that are needed by the majority of users. - Additional functions can be used on demand by loading **packages**. .content-box-blue[ A **package** is a **collection of functions, datasets and documentation** that extends the capabilities of base R. ] --- ## Packages on CRAN<sup>1</sup> <img src="figures//00-r_cran_2.png" width="80%" /> .font80[ <sup>1</sup> [CRAN](https://cran.r-project.org/), the _The Comprehensive R Archive Network_, consists of multiple worldwide mirror servers, used to distribute `R` and `R` packages. Figure source: <https://gist.github.com/daroczig/3cf06d6db4be2bbe3368> ] --- class: center, middle, inverse ## Tour: R and RStudio --- ## Calling functions .left-column[ Basic function call scheme: ```r some_function(arg_1 = val_1, arg_2 = val_2, ...) ``` <img src="figures//00-mean_help.png" width="95%" /> ] .right-column[ Example: the `mean()` function: - `x` is the only mandatory argument - arguments `trim` and `na.rm` have default values ```r x <- 1:10 mean(x) # trim = 0 and na.rm = FALSE ``` {{content}} ] -- ``` ## [1] 5.5 ``` {{content}} -- ```r x <- c(1:10, NA) mean(x) ``` {{content}} <!-- # Mean of a vector with >=1 NA's yields NA --> -- ``` ## [1] NA ``` {{content}} -- ```r mean(x, na.rm = TRUE) # NA's will be ignored ``` ``` ## [1] 5.5 ``` {{content}} -- ```r mean(x, TRUE) # match unnamed args to their position ``` {{content}} -- ``` ## Error in mean.default(x, TRUE): 'trim' must be numeric of length one ``` {{content}} --- ## Downloading, installing and loading packages ```r # A package has to be installed only once: install.packages("dplyr") # -> install package "dplyr" from CRAN # Load the package once per session: library(dplyr) ``` -- Alternatively, you can install and load packages in RStudio using the Packages tab. <img src="figures//00-rstudio_package_installer.png" width="75%" style="display: block; margin: auto;" /> .content-box-gray[ To install a package that is hosted on GitHub use `remotes::install_github("<REPOSITORY>")`. ] --- ## Data Frames - `data.frame` is `R`'s data structure for a **table**. - A data frame is a rectangular collection of data, arranged by **variables** (**columns**) and **observations** (**rows**). Columns of data frames are accessed with `$`: ```r dataframe$var_name ``` --- ## The Tidyverse .pull-left[ <img src="figures//00-tidyverse.png" width="100%" /> ] .pull-right[ Quote from the [Tidyverse website](https://www.tidyverse.org/): .content-box-gray[ .font110[ "**R packages for data science.** The tidyverse is an **opinionated collection of R packages designed for data science**. All packages share an underlying **design philosophy, grammar, and data structures**." ] ] → collection of open-source `R` packages mainly for data wrangling and visualization → shared conventions and common APIs across all Tidyverse packages ] --- ## R Markdown .pull-left60[ **R Markdown** is a file format to combine **code**, the associated **results** and **narrative text** in a simple text file, to create **reproducible reports** which can be **flexibly distributed in multiple ways**. An R Markdown document is saved as `.Rmd` file. It contains both the (`R`) code and a prose description of a data analysis task. The R Markdown document can be rendered as HTML, PDF, Word and various other output formats. **Pros** of R Markdown documents: - **reproducibility** (reduce **copy&paste**) - **simple Markdown syntax** for text - a **single source document** that can be rendered for different target audiences and purposes - **simple, future-proof file format** that can be managed by a version control system like Git or SVN ] .pull-right40[ <img src="figures//00-rmarkdown-logo.png" width="40%" style="display: block; margin: auto;" /> <https://rmarkdown.rstudio.com/> <img src="figures//00-three_outputs.png" width="100%" style="display: block; margin: auto;" /> ] ??? - file extension - Jupyter notebooks in Python (combine text, code and results) - flexibility in terms of type of output document - flexibility in terms of target audience -> parametrized reports - you: lab notebook containing code, conclusions and reasoning behind your analysis - decision makers: hide code, focus on charts and conclusions - colleagues: focus both on conclusions and code to ensure reproducibility and to facilitate collaborating - rmd is plain text file so it can be managed by a version control system - simple syntax --- name: output class: center, middle <img src="figures//00-rmarkdown_sketch.png" width="100%" style="display: block; margin: auto;" /> ??? - process: Rmd -> md -> rendered document - rmd: md + r-code - knitr: run code and combine these code results (charts, models, console output) with md markup of rmd file - pandoc (general file converter) converts this augmented md file to the desired output HTML, pdf, word etc. - simplified process - based on the output, more packages are needed - e.g. blogdown: Hugo required (open-source static site generator) --- class: middle <img src="figures//00-rmd_first_example_1.png" width="80%" style="display: block; margin: auto;" /> ??? - YAML header: meta data - code chunks: can be R but there are other languages supported, such as python, javascript, sql, ruby - Markdown text - inline code: allows for dynamic content (e.g.how many records) using simple commands - no need for manual copy and paste of the code output into the narrative text. --- class: center, middle, inverse ## Tour: R Markdown --- ## Structure of an `.Rmd` file .pull-left[ <img src="figures//00-rmd_first_example.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ - **YAML header** (metadata) - enclosed by lines with three dashes - collection of key-value pairs separated by colons - narrative text as **Markdown markup** - **`R` code** - as code chunks surrounded by <code>`</code><code>`</code><code>`{r}</code> and <code>`</code><code>`</code><code>`</code> - as inline code surrounded by <code>`r </code> and <code>`</code> ] --- ## R Markdown help .pull-left[ **Markdown Quick Reference**: Help → Markdown Quick Reference <img src="figures//00-rstudio-markdown-quick-ref-1.png" width="100%" /> ] .pull-right[ **R Markdown Cheat Sheet**: Help → Cheatsheets → R Markdown Cheat Sheet <img src="figures//00-rmarkdown-cheatsheet-cutout.png" width="100%" /> ] --- ## Environments .content-box-yellow[ ⚠️ Each R Markdown document has its own environment. → Consequence: Objects from the global environment are unavailable in the .Rmd document when knitting. ] -- #### Example: 1\. Run the following code in the console: ```r x <- 5 x * 3 ``` 😊 _All good!_ -- 2\. Add the following code within a code chunk to your .Rmd file and knit the document: ```r x * 3 ``` 😭 _Knitting fails!_ --- ## How will we use R Markdown? - Every exercise comes with an R Markdown document which contains some code scaffolding. - You submit your project report as an R Markdown document. - The course slides are made with R Markdown. ??? - amount of scaffolding will decrease over the course of the semester --- class: last-slide, center, bottom # Thank you! Questions? .courtesy[📷 Photo courtesy of Stefan Berger]