Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
A flexible two-column Jekyll theme. Perfect for personal sites, blogs, and portfolios hosted on GitHub or your own server. Latest release v4.9.1
Data Analysis
Use R for data analysis and visualization, train models and estimate errors, and use GitHub for comprehensive documentation and task management.
Splash Page
Bacon ipsum dolor sit amet salami ham hock ham, hamburger corned beef short ribs kielbasa biltong t-bone drumstick tri-tip tail sirloin pork chop.
Posts
unit00
Learning Environment
This course is intended as a blended learning module, although the provided introductions, explanations and examples might be useful for self-study only, too.
Deliverables
Assignments We distinguish between unmarked and marked deliverables (“Studien- und Prüfungsleistung”). Both are required for passing the course but only the...
Frequently Asked Questions
This is a continuously updated collection of frequently asked questions. Course requirements What is the expected workload for this course? This course giv...
unit01
First Things First
Go through a brute force introduction into R, R Markdown, the RStudio IDE, version management with Git and GitHub’s classroom functionality to get ready for ...
R and RStudio
Start to learn the essentials for working with R and RStudio.
Example: Vector Basics
Vectors are the basis for many data types in R. Creating a vector A vector is created using the c function. Here are some examples: my_vector_1 <- c(1,2...
Example: Data Frame Basics
Data frames are one of the most heavily used data structures in R. Creation of a data frame A data frame is created from scratch by supplying vectors to the...
Example: R Markdown with html output
This page shows how a compiled R markdown file looks like (in fact, all code examples in this course were compiled with R markdown). This is a header This ...
Git and GitHub
Learn the essentials for working with Git and GitHub.
Assignments and GitHub
A note on individual learning log assignments with GitHub Within this course, you will individually submit your personal solutions for the course assignments...
Marked Assignment: Hello R, Hello GitHub
This worksheet introduces you to R, R scripts and R markdown. Your submission will be pushed to your class repository at GitHub. After completion you should ...
unit02
First Things Second
Look closer at data types and object types before focusing on the most important features of programming languages, namely operators and control structures.
Data Types
Learn how data is measured and organized from an R perspective.
Object Types
Learn how data types are structured within different object types in R.
Indexing
Learn how to find, address, and change elements in R objects.
Operations
Learn how operators and control structures work in R.
Unmarked Assignment: Loop and Conquer
This worksheet provides some control structure and loop examples to help you getting familiar with these probably most important properties of any programmin...
unit03
Look at Your Data
Become familiar with reading and writing data, computing summary statistics and visual data exploration as the basics of data analysis.
Tabulated Data I/O
Reading or writing tabulated data into or from a data frame is a quite common task in data analysis. You could use the read.table function for this. df <-...
Visualization
Do not wait until the very final analysis stage to produce some publication quality graphics but produce fast (not necessarily nice) visualizations all the w...
Example: CSV I/O
Reading data from text files Reading text files is realized using the read.table function from R’s utils library. The function will return a data frame whic...
Example: Aggregation Statistics
Summarizing a data set The most straight forward function which returns some aggregated statistical information about a data set is summary. a <- c("A",...
Example: Visual Data Exploration
Visual data exploration should be one of the first steps in data analysis. In fact, it should start right after reading a data set. The following examples ar...
Marked Assignment: Read and Plot
This worksheet will guide you in getting a first overview of the wood harvest in Hessen between 1997 and 2014 using a visual data exploration. After completi...
unit04
Introduction
Check the integrity of datasets and clean them up to ensure that the data basis for your analysis is consistent.
Cleaning 101
Cleaning 101 Cleaning dataset is a standard procedure in data analysis and the most annoying. It can be quite time consuming but it is the most important ste...
Example: Missing Values
Handling missing values is straight forward. Let’s start with a vector with one NA value at position 3. Please note that NA is not inside quotation marks sin...
Example: Date/Time
Coercing data types to date and/or time information is generally performed using as.Date or either as.POSIXct or as.POSIXlt. Let’s start with as.Date: as.Da...
Example: Sorting
For a quick introduction to sorting and combining data in R check out our own material in the accompanying Base R course,
Example: Cleaning Columns
Cleaning data frames involves quite different aspects like splitting cell entries, converting data types or the conversion of “wide” to “long” format. In ge...
Unmarked Assignment: Cleaning Crops
This assignment is the first in a series which use regional statistical data. While the wood harvest data from Hessen was (i) quite small and (ii) quite tidy...
unit05
Describe your linear data
Compute simple statistical linear regression models that relate a dependent to an independent variable.
Basic idea of statistical modeling
Basic idea of statistical modeling Use observation samples to describe the relationship between a dependent variable and one or more independent variables. ...
Example: Simple Bivariate Linear Regression
Linear regression modelling is one of the more common tasks in data analysis and the following example will cover the very basic topic of bivariate linear re...
Marked Assignment: Recreation vs. Settlement
This worksheet tackles the question, how the percentage share of settlement area is related to the share of recreation area in each community. After complet...
unit06
Predict your linear data
Compute simple linear models to predict dependent data and assess the performance with independent test samples.
Cross validation
Test statistics can describe the quality or accuracy of regression models if the assumptions of the models are met. However, the assessment would still be b...
Unmarked Assignment: Recreation vs. Settlement revisited
This worksheet revisits the settlement vs. recreation model and compares to which degree the results describing the performance of the model differ between t...
unit07
Select your variables
Evaluate the importance of your independent variables and select an optimal subset for your prediction model.
Feature selection in multiple variable models
So far, the models have only considered one explanatory (i.e. independent) variable. If a dependent variable should be explained or predicted by more than o...
Marked Assignment: Wheat vs. everything else
This worksheet uses the crop dataset cleaned previously to extend the prediction of winter wheat to multiple variables using a forward feature selection appr...
unit08
Tune your model
Evaluate model tuning strategies and find optimal settings for your prediction model.
Generalized additive models
So far, the models we have seen only considered linear relationships. The corresponding model type to simple linear models would be an additive model and fo...
Unmarked Assignment: Model Tuning
This worksheet uses cross-validation strategies for tuning an additive model. After completing this worksheet you should have improved your skills for handl...
unit09
Predict Your Temporal Data
Look into some specific characteristics of time series data and predict future observations based on past dynamics.
Time Series
Although we already had contact with some temporal datasets, we did not have a closer formal look on time series analysis. Time series datasets often inhibit...
Predicting time series
Time-series analyses can generally be divided into forecasting future dynamics and describing and potentially explaining past patterns. Since the latter oft...
Unmarked Assignment: Precipitation Forecast
This worksheet introduces you to ARIMA modeling using a precipitation time series recorded at a station near Marburg. After completing this worksheet you sh...
unit10
Analyse Your Temporal Data
Analyse your time series data and decompose it into seasonal characteristics and long-term trends.
Time Series Decomposition
After looking into time-series forecasting, we will now switch to some basics of describing time series. To illustrate this, we will again use the (mean mon...
Time series clustering
Just as one last example on time series analysis for this module and mainly for demonstrating that this module only tipped a very small set of analysis conce...
Unmarked Assignment: NAO and Cölbe
This worksheet focuses on the analysis of meteorological time series data recorded at a station near Marburg University Forest and some global teleconnection...
unit11
Marburg Open Hackathon
Follow the link to start the Marburg Open Hackathon (MOHA)
unit12
Graphics
Visualize your data, get some hints for publication quality graphics, and learn about some packages specifically made for visualizations.
Example: Colours
Before we expand our plotting capabilities, we want to spend a bit more time thinking about colours and colour spaces. A careful study of colour-spaces (e....
Example: Colours and maps
This is a short example on how to use the hcl colour palette for colouring features of a shapefile. Set up # Load the required packages library("terra") # ...
Example: The R Graph Gallery
Finally, check out the R Graph Gallery for getting an impression of the many more data visualization possibilities in R.