badge icon

This article was automatically translated from the original Turkish version.

Article

Convolutional Neural Networks

Philosophy

+2 More

Convolutional Neural Networks (Convolutional Neural Networks - CNN) are a type of deep learning model widely used in common fields such as image processing object recognition and computer vision. CNNs are preferred over traditional artificial neural networks due to their lower number of parameters and their ability to effectively learn local features.


The foundation of convolutional neural networks lies in artificial neural networks which are models inspired by the functioning of the human brain. The primary goal of these networks is to enable learning interpret input data and allow the system to make autonomous decisions based on the interpretation. Artificial neural networks fundamentally consist of three layers.


1. Input Layer

2. Hidden Layer

3. Output Layer


In the input layer features are fed into the system. The number of knot in this layer equals the number of attributes that best represent the feature. The hidden layer multiplies the values received from the input layer by specific coefficients and applies transformation operations to these values. For example specific threshold conditions applied at these nodes can yield as many value as the number of outputs. In the output layer that follows the hidden layer predictions or interpretations are made based on the extracted results.


Historical the artificial neural network model was first proposed by Warren S. McCulloch and Walter Pitts in 1943. However due to the limited computational power of that era the potential of this model was not fully understood. By the 2000s significant advances in computing power led to increased success and popularity of artificial neural networks. Nevertheless the human brain which inspired the model contains approximately 100 billion neurons. It is currently impossible to model this structure exactly with existing systems. However with advancing technology these models continue to evolve.


Key Components of CNN

A convolutional neural network architecture


Convolutional Layer

The convolutional layer is the most important component of a CNN. This layer extracts local features from the input image using filters (kernels). Each filter scans a specific region of the image and multiplies its values with the corresponding pixels to generate a feature map.


1. Filter (Kernel): Matrices typically of size 3x3 or 5x5. These filters are used to detect features such as edges corners or tissue like patterns.

2. Stride: Determines the step size by which the filter moves across the image. For example if stride=1 the filter shifts by one pixel.

3. Padding: Zero values are added to the image borders to preserve its dimensions.


Example of convolution operation (generated by artificial intelligence)


Pooling Layer

In the architecture of Convolutional Neural Networks (CNNs) a pooling layer is often placed between successive convolutional layers. The purpose of the pooling layer is to reduce the size of the image and thereby decrease the number of parameters when images are very large. This operation reduces the spatial dimensions of each feature map while preserving important information. There are two main types of pooling: max pooling and average pooling.



Example of a pooling layer (Credit: Mayank Jain)



Max pooling is used to reduce the spatial dimensions of the input data after the convolutional layer. It is applied between two convolutional layers. In max pooling the dominant features are extracted by reducing height and width while preserving rotational and positional invariance which allows the model to continue training effectively.


Max Pooling (Credit: Afshine Amidi and Shervine Amidi CNN Handbook)


Average Pooling (Credit: Afshine Amidi and Shervine Amidi CNN Handbook)


Fully Connected Layer

The final step in a Convolutional Neural Network occurs in this layer. Fully Connected Layers are a type of artificial neural network in which every node in one layer is connected to every node in the next layer. This architecture is computationally complex and prone to overfitting. After the Flattening operation is applied to the extracted data learning is performed using neural networks.


Flattening Layer (Credit: Rubiscode.net)



Working Principle of the Fully Connected Layer (Credit: teknoloji.org)


CNN Learning Process

The learning process of a CNN enables the model to learn meaningful features from input data and optimize its weights to make more accurate predictions. This process consists of the following steps: forward propagation loss calculation backpropagation optimization and weight update.


Forward Propagation

Forward propagation involves processing the input data through the layers of the CNN to generate an output prediction. This process includes the following stages:


1. Input Layer: Typically an image is provided as input. For example a 28x28 pixel handwritten digit or a 224x224x3 dimensional colorful image can be used as input.


2. Convolutional Layers: The input image is processed by filters (kernels) in the convolutional layer. Each filter scans a specific region of the image and multiplies its values with the corresponding pixels to generate a feature map. For instance a 3x3 filter is slid across the image to extract local features.


3. Activation Function: After the convolution operation an activation function (typically ReLU) is applied to the feature map. This step enables the model to learn nonlinear patterns.


4. Pooling Layer: The pooling layer reduces the size of the feature map. Methods such as max pooling select the maximum value within a defined window. This reduces computational cost and makes the model more robust to scale variations.


5. Fully Connected Layer: Features extracted from the convolutional and pooling layers are fed into the fully connected layer. This layer combines the features to perform classification or regression tasks.


6. Output Layer: Finally in the output layer (e.g. using softmax) the model computes its predictions. For example it determines which class a given image belongs to.


Loss Calculation

The outputs generated by forward propagation are compared with the reality values (ground truth) using a loss function to measure the model's error. Common loss functions include:


1. Cross-Entropy Loss: Used for classification tasks.

2. Mean Squared Error (MSE): Used for regression tasks.


The loss function measures the model's performance and this value is used to optimize the model.


Backpropagation

Backpropagation is an optimization process used to minimize the model's errors. It calculates the gradients of the loss function to update the model's parameters (filters and weights). The steps of backpropagation are as follows:


1. Gradient Calculation: The derivative (gradient) of the loss function with respect to each model parameter is computed. These gradients determine how much and in which direction each parameter should be updated.


2. Chain Rule: Backpropagation is performed using the chain rule. The chain rule states that gradients across layers are interdependent and propagate backward through the network.


3. Gradient Update: After computing the gradients for each layer the filters and weights are updated using these gradients.


Parameter Update

The gradients computed during backpropagation are used to update the model's parameters using optimization algorithms such as Gradient Descent. This step enables the model to minimize its errors.


1. Gradient Descent: Gradient descent updates the parameters in the opposite direction of the gradient to minimize the loss function. The update step is expressed by the following formula:


Where:

  • θ: Model parameters (filters and weights),
  • η: Learning rate,
  • L(θ): Gradient of the loss function.


2. Optimization Algorithms: In addition to gradient descent more advanced optimization algorithms can be used. For example:

a. Momentum: Adds momentum to gradient descent to enable more fast learning.

b. Adam (Adaptive Moment Estimation): Adapts both momentum and the learning rate dynamically.



Advantages of CNN

1. Parameter Efficiency: CNNs use fewer parameters compared to fully connected layers. This allows faster training and reduced memory consumption.

2. Learning Local Features: CNNs can effectively learn features from local regions of an image.

3. Scale and Transformation Invariance: Thanks to pooling layers CNNs are more robust to changes in scale and transformation.


Applications

CNNs are widely used in the following areas:


1. Image and video understanding

2. Recommendation/forecast systems

3. Image Classification

4. Image Segmentation

5. Medical Image Processing

6. Natural Language Processing

7. Brain-Computer Interface

8. Time Series


Author Information

Avatar
AuthorBeyza Nur TürküDecember 25, 2025 at 7:57 AM

Tags

Discussions

No Discussion Added Yet

Start discussion for "Convolutional Neural Networks" article

View Discussions

Contents

  • Key Components of CNN

    • Convolutional Layer

    • Pooling Layer

    • Fully Connected Layer

  • CNN Learning Process

    • Forward Propagation

    • Loss Calculation

    • Backpropagation

    • Parameter Update

  • Advantages of CNN

  • Applications

Ask to Küre