Artificial intelligence - the very basics

Let’s assume we’ve built the perfect energy-efficient sensor, which runs reliably in its designated location and continuously provides us with high-quality images or audio data, each capturing our target perfectly. All this would be pointless if there wasn’t a way to automatically classify these images and potentially even identify species, as manually reviewing the images would be extremely time-consuming and exhausting. We need Artificial Intelligence (AI).

AI refers to the development of computer systems that can perform tasks that typically require human intelligence. These tasks include learning from experience, understanding natural language, recognizing patterns, and making decisions.

Machine Learning (ML) is a subset of AI focused on developing algorithms and models that enable computers to learn from data and make predictions or decisions. ML models identify patterns in data and improve their performance over time through training and are essential if we want automated image or sound identification.

This sounds complicated, but the basic principle remains the same as in classical data modeling: given a variable ( X ), we try to estimate a target variable ( Y ). In classical data modeling, we would assume that the relationship between ( Y ) and ( X ) can be approximated with a mathematical formula, such as a logarithmic regression. The coefficients and variance would be estimated from the data. Machine learning, on the other hand, does not assume a specific form. Instead, it aims to find a function that best describes the relationship between ( X ) and ( Y ) without any prior assumption.

Machine Learning

In machine learning, there are three primary types of learning paradigms: unsupervised learning, supervised learning, and reinforcement learning.

Unsupervised Learning: Uses data without explicit labels. This type of learning paradigm aims at finding similarities within the data, which can be used to understand the underlying structure and distribution of data.
Supervised Learning: Involves training a model on a labeled dataset, meaning that each training example is paired with an output label. The main tasks of supervised learning are classification (categorizing inputs) and regression (predicting continuous outputs).
Reinforcement Learning (RL): A type of machine learning where an agent (like a robot or a computer program) learns to make decisions by interacting with its environment. The agent tries different actions and learns from the results of those actions to get the best possible outcome. In contrast to supervised learning, which is about predicting the correct output based on provided examples, the goal of reinforcement learning is to learn a policy for making a sequence of decisions that maximizes cumulative rewards over time.

Because we’re primarily interested in identifying critters, we’ll focus on Supervised Learning.

Think About It

What obstacles can you think of when it comes to labeling data for supervised learning?

Click here to see some common obstacles

- **Time-Consuming:** Manually labeling large datasets can be very!! time-consuming. - **Costly:** Hiring experts to label data, especially for specialized tasks, can be expensive - if they can be found at all. - **Human Error:** Labels can be inconsistent due to human error or subjective judgment. - **Ambiguity:** Some data points may be difficult to label clearly, leading to ambiguous or incorrect labels. - **Imbalance:** In some cases, there might be an imbalance in the labeled data (e.g., more labels for common critters, nearly none for rare ones), which can affect model performance.

Neural networks

One of the most well-known methods in machine learning to find this function is neural networks. Neural networks are computational models inspired by the human brain, consisting of interconnected nodes (neurons) organized into layers. Each neuron receives input signals, processes them by applying a weighted sum followed by an activation function, and passes the output to the next layer. The network typically includes an input layer, one or more hidden layers, and an output layer. Following our example, ( Y )becomes the output layer, and ( X ) becomes the input layer.

Hot Mess “Machine learning” by xkcd.com. This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License.

How do Neural Networks Work?

During training, the network adjusts the weights of the connections using algorithms like backpropagation, which minimizes the error between predicted and actual outputs by iteratively updating weights based on the gradient of the loss function. This process enables the neural network to learn complex patterns and make accurate predictions or decisions based on input data.

In the input layer, raw data (e.g., pixel values of an image) is fed into the network. This data passes through hidden layers, where it is processed and transformed by neurons. Finally, the transformed signal reaches the output layer, where the final prediction (e.g., the identity of a species) is produced.

This principle is wonderfully explained here:

If there are many hidden layers, that is, if the hidden layers are “deep” it is called deep learning. These deep neural networks can automatically learn hierarchical feature representations. Deep learning models can have hundreds or even thousands of layers, making them capable of learning extremely complex patterns and representations.

A class of deep neural networks commonly used for analyzing visual data are so called Convolutional Neural Networks (CNNs). These have Convolutional Layers, which applies a set of filters (or kernels) to the input image, sliding them across the image to produce a feature map. Each filter detects a specific feature such as edges, textures, or patterns. They also have pooling layers, which reduce the spatial dimensions (height and width) of the feature maps, retaining the most important information while reducing the computational load and controlling overfitting. After several convolutional and pooling layers, the output is flattened and fed into dense layers. These layers are similar to those in traditional neural networks and consist of neurons connected to all activations in the previous layer.

Let’s have a look at here

Let’s have a look in R!

Convolutional layers are the core of a CNN. They apply a series of small filters (also called kernels) across the image. These filters are designed to detect local patterns — such as edges, shapes, or textures — without the model having to “see” the whole image at once. The network learns the best filters during training. Pooling layers usually follow convolutional layers. Their role is to reduce the spatial size of the feature maps, which makes the network more efficient and robust to small translations or distortions. Fully connected (dense) layers appear at the end of the network. These layers interpret the features detected in earlier layers and make the final classification decision. Dropout layers and batch normalization are often added to improve performance and generalization. Dropout randomly deactivates some neurons during training, which helps prevent overfitting. Batch normalization normalizes the output of a layer so that the network trains faster and is more stable.