By Indranil Singha

Machine learning is reshaping how we process and analyze data, with applications in every domain—from healthcare to entertainment. So welcome onboard to those who are here to read my struggles, achievement and learning throughout my projects that I have been doing over time. In this blog, I’m going to take you through one of my most exciting projects: a machine learning model for analyzing and predicting different aspects of video data. This was assigned by DSG club, IITR.

Project Overview: What’s the Goal? 🔭

The primary goal of this project was to build a machine learning pipeline capable of predicting five distinct attributes, represented as columns in a CSV file, by analyzing features extracted from video frames. The dataset was huge, comprising 10,000 unique videos. Each video was broken down into 20 individual frames, with each frame having a resolution of 64x64 pixels. This presented a considerable computational challenge due to the size of the extracted features which was around 2.5 lakhs elements, but this also offered a wealth of data to analyze.

Basically, I am on a mission to identify the characteristics of the Titan’s who had invaded on Paradis Island. To fight those cruel creatures, one must know their strengths and weaknesses. So, are we ready to start the project??

So let me walk you through the different columns of data and what it says about the Titans…

Element: This column states the type of titans present in the videos.
Motion: This column described the type of movement observed, requiring the model to classify motion patterns.
Power: This is an encoded attribute representing a prominent feature of the Titan’s appearance. This is a more complex attribute to predict since our data has been partially corrupted
Speed: This column tells us the speed of titans which is another key factor to analyze, requiring high precision in detecting variations.
Summary: This was a combination of multiple characteristics or a 2D representation of the whole video in some ancient cryptic way.

project data

Training csv file

                                                                  Training csv file

The first step in the process was to extract meaningful features from these raw video frames. Since the size of the extracted features were too large for machine to compute, I employed a deep learning model, ResNet50, to handle feature extraction.

<aside> 💡

ResNet50, a convolutional neural network widely known for its robustness and efficiency, allowed me to preprocess and analyze the frames effectively. By removing the model's final classification layer, I transformed it into a feature extractor, enabling it to capture essential characteristics of the video frames without classifying them directly.

</aside>

To further enhance the extracted features, I applied preprocessing techniques like Gaussian blur and grayscale transformations. The Gaussian blur helped smooth the image, reducing noise and improving edge detection. Grayscale transformation simplified the data by focusing on intensity values, making the model more efficient at detecting patterns. These steps ensured that the extracted features were both meaningful and computationally efficient.

glimpse of train dataset

                glimpse of train dataset