ML · AesVoy

VGGNet Overview

2 September 2025·1820 words·9 mins

VGGNet is a famous deep learning model used in computer vision—essentially, teaching computers to understand images. It was created by researchers at the Visual Geometry Group (VGG) at the University of Oxford. Since its debut in 2014, VGGNet has become one of the key models that helped advance how machines see and recognize objects in photos. At its core, VGGNet is designed to look at images and decide what is in them.

Gradient-Based Learning Applied to Document Recognition

30 August 2025·860 words·5 mins

LeNet-5 is an early and very influential type of convolutional neural network (CNN) developed by Yann LeCun and his colleagues in 1998, designed mainly to recognize handwritten digits like those in the MNIST dataset. What makes LeNet-5 special is how it combines several clever ideas that allow it to efficiently and accurately understand images despite their complexity—ideas that were crucial stepping stones for today’s deep learning revolution.

Pioneers of Machine Learning and Artificial Intelligence

12 February 2025·591 words·3 mins

The journey of pioneers in Machine Learning (ML) and Artificial Intelligence (AI) is a remarkable tale of innovation, collaboration, and the relentless pursuit of knowledge.

Gran Turismo's Sophy AI

20 September 2024·859 words·5 mins

Gran Turismo Sophy is an advanced AI racing agent developed through a collaboration between Sony AI, Polyphony Digital, and Sony Interactive Entertainment. This groundbreaking technology utilizes deep reinforcement learning to master the complexities of competitive racing in the Gran Turismo Sport simulator. Initially starting as an AI that struggled to navigate tracks, Sophy has evolved into a formidable competitor capable of challenging top human drivers by mastering racing tactics, etiquette, and vehicle control.

DeepFake Detection Methods

9 August 2024·1162 words·6 mins

In this blog post, we explore the topic of image generators and their detection techniques. I’ll discuss various methods for detecting image generators and their manipulations. These include analyzing the visual content of an image, examining its metadata, and using machine learning algorithms to identify patterns in the data.

From CNNs to Vision Transformers: The Future of Image Recognition

8 August 2024·6015 words·29 mins

Vision Transformers (ViTs) are redefining image recognition by using Transformer models to capture global context, unlike traditional Convolutional Neural Networks (CNNs) that focus on local features. ViTs excel with large datasets and show impressive scalability and performance.

imageNet-Computer Vision Backbone

6 August 2024·1065 words·5 mins

ImageNet is more than just a dataset. The sheer scale of ImageNet, combined with its detailed labeling, made it essentially the backbone of Computer Vision.

Transformers & Attention

5 August 2024·866 words·5 mins

This blog post explains how self-attention and softmax function in Transformer models, crucial for modern NLP. It breaks down how self-attention helps models understand relationships between tokens and how softmax ensures efficient computation and numerical stability.

Diffusion VS Auto-Regressive Models

4 August 2024·1085 words·6 mins

Generative AI has come a long way, producing stunning images from simple text prompts. But how do Diffusion and Auto-Regressive models work, and why are diffusion models preferred.

AlexNet Revolution

26 July 2024·1304 words·7 mins

In 2012, the field of artificial intelligence witnessed a seismic shift. The catalyst for this transformation was a deep learning model known as AlexNet.