Types of Data

“You can have data without information, but you cannot have information without data” — Daniel Keys Moran

Almost all programming languages explicitly include the notion of data type, though different languages may use different terminology. That means when programming, the variables generally have a specific type, and they should be selected the appropriate type according to what you want to use the variable for.

Common data types include:

  • integer
  • double or floating-point number
  • character string
  • logical or boolean

In contrast to other programming languages like C and java in R, you do not have to declare what data type your newly created object has to have. Instead, when you assign a value to a variable, R determines the data type of that variable based on the assigned value. This means that the data type of the object is determined dynamically at runtime – but you might have to change that using the “as.” -function if necessary.

Data types are used within type systems, which offer various ways of defining, implementing and using them. Different type systems ensure varying degrees of type safety.

Integer

int is short for “integer”, which means whole numbers. It is used to specify that the variable contains only whole numbers. For example, 3 is an integer but 3.25 is not an integer. The range goes from -2,147,483,648 to 2,147,483,647.

Double or Floating-point number

float is short for “floating-point number”, which is a fractional number, eg, 3.25907. This can be a number between roughly 1.5 x 10^45 to 3.4 10^38, in floating point format.

Note: Watch your decimal separator! The decimal separator is a symbol used to separate the integer part from the fractional part of a number written in decimal form. Different countries officially designate different symbols for use as the separator. The choice of symbol also affects the choice of symbol for the thousands separator used in digit grouping.

Currently there are three ways to group the number ten thousand with digit group separators.

Seperator Meaning
10 000,00 Space, the internationally recommended thousands separator
10.000,00 Period (spoken as point), the thousands separator in many non-English speaking countries.
10,000.00 Comma, the thousands separator used in most English speaking countries.

The first one is not suitable for data entry because of the space character.

With the second and third selection, it is very important to pay attention to the decimal separator with which the original data is saved. In Europe the point is preferred for thousands and the comma for decimals. In English-speaking countries, the comma is selected for thousands and the point is being used for decimals. Accordingly, care must be taken when importing the data.

More information on this in Chapter 6: In- and Output of Data

Character

characters basically behave like words. They can be stringed together, they can be compared, but they can’t be used for calculations.

Examples of characters include letters, numerical digits, common punctuation marks (such as “.” or “-“), and whitespace. The concept also includes control characters, which do not correspond to visible symbols but rather to instructions to format or process the text. Examples of control characters include carriage return or tab, as well as instructions to printers or other devices that display or otherwise process text.

String

Characters are typically combined into strings. A string is a sequence of characters, e.g. “Hello World.” It is used to represent text rather than numbers. It is comprised of a set of characters that can also contain spaces and numbers. For example, the word “hamburger” and the phrase “I ate 3 hamburgers” are both strings, consisting of characters.

Logical or boolean

Logical values or boolean values can only contain either the value true or false.

When things are lost …None-values

When there are missing or undefined values in your data, they’ll be indicated by different so-called None-Values.

None is a special constant in Python that represents the absence of a value or a null object. It is used to indicate that a value is missing or undefined in a dataset where a value should be present but is unknown or unavailable.

NaN (Not-a-Number) is a special floating-point value that represents the result of an undefined or unrepresentable mathematical operation. It applies to numerical values and is often encountered in numeric calculations where the result is not a valid number. In Python, NaN can be found in the math and numpy libraries.

inf and -inf represent positive and negative infinity, respectively, in Python. They are used when a value is too large to be represented as a finite number. Typically, these values result from mathematical operations like dividing a number by zero. inf and -inf can also be found in the math and numpy libraries.

Data Type Examples
integer 1; 15 ; 1984297
float 1.15 ; 1007.28 ; 0.0001
character © ; H ; π ; A; B; C
string “Hello World” ; “Ursus maritimus” ; “black”
boolean true ; false

Take the test, to check if you understand everything correctly:


Let’s move on…

Updated: