Converting Types of Data
“Errors using inadequate data are much less than those using no data at all.” – Charles Babbage
In Python, data types are automatically assigned to variables, or the data type of a variable can be subsequently converted into another data type. To check the data type, the type()
function is used. To check for a specific data type you can use the isinstance()
-function. Conversion functions like int(), float(), and str() are used to change the data type.
Python has three numeric classes. The two most common are float (for floating-point numbers) and int (for integers). Python will automatically convert between numeric classes when needed, so it generally does not matter whether the number 3 is currently stored as an integer or a float. Most math is done using float precision, so that is often the default storage.
Sometimes you may want to specifically store a variable as an integer if you know that it will never be converted to a float (used as ID values or indexing) since integers require less storage space. But if they are going to be used in any math that will convert them to float, then it may be best to store them as floats from the beginning.
The table below gives an overview of the different data types in Python.
Data Type | check type | convert |
---|---|---|
integer | isinstance(x, int) |
int(x) |
float | isinstance(x, float) |
float(x) |
character | isinstance(x, str) |
str(x) |
boolean | isinstance(x, bool) |
bool(x) |
None | if x is None |
x = None |
NaN | math.isnan(x) or numpy.isnan(x) |
x = np.nan |
*Info
The function numpy.isnan()
indicates which elements are NaN (missing or undefined).
To set elements to NaN, you can use numpy.nan
.
While factors (categorical data) are a unique feature of R, Python uses the pandas.Categorical
type to manage categorical data and maintain the integrity and order of those categories.
Example
value = 23.5
print(isinstance(value, float))
# Output: True
print(isinstance(value, str))
# Output: False
colors = ["blue", "red", "red", "yellow"]
print(colors)
# Output: ['blue', 'red', 'red', 'yellow']
print(isinstance(colors, list))
# Output: True
# Convert to set to manage unique colors (similar to R's factor)
colors_set = set(colors)
print(colors_set)
# Output: {'blue', 'yellow', 'red'}