By Indranil Singha
Machine learning is reshaping how we process and analyze data, with applications in every domain—from healthcare to entertainment. So welcome onboard to those who are here to read my struggles, achievement and learning throughout my projects that I have been doing over time. In this blog, I’m going to take you through one of my most exciting projects: a machine learning model for analyzing and predicting different aspects of video data. This was assigned by DSG club, IITR.
The primary goal of this project was to build a machine learning pipeline capable of predicting five distinct attributes, represented as columns in a CSV file, by analyzing features extracted from video frames. The dataset was huge, comprising 10,000 unique videos. Each video was broken down into 20 individual frames, with each frame having a resolution of 64x64 pixels. This presented a considerable computational challenge due to the size of the extracted features which was around 2.5 lakhs elements, but this also offered a wealth of data to analyze.
Basically, I am on a mission to identify the characteristics of the Titan’s who had invaded on Paradis Island. To fight those cruel creatures, one must know their strengths and weaknesses. So, are we ready to start the project??
So let me walk you through the different columns of data and what it says about the Titans…

Training csv file
The first step in the process was to extract meaningful features from these raw video frames. Since the size of the extracted features were too large for machine to compute, I employed a deep learning model, ResNet50, to handle feature extraction.
<aside> 💡
ResNet50, a convolutional neural network widely known for its robustness and efficiency, allowed me to preprocess and analyze the frames effectively. By removing the model's final classification layer, I transformed it into a feature extractor, enabling it to capture essential characteristics of the video frames without classifying them directly.
</aside>
To further enhance the extracted features, I applied preprocessing techniques like Gaussian blur and grayscale transformations. The Gaussian blur helped smooth the image, reducing noise and improving edge detection. Grayscale transformation simplified the data by focusing on intensity values, making the model more efficient at detecting patterns. These steps ensured that the extracted features were both meaningful and computationally efficient.

glimpse of train dataset