Recently, computer vision researchers have focused on human action recognition in video clips, applying it to various applications such as surveillance and sports analysis. In this study, deep features extracted from keyframes were utilized to recognize human actions in movie scenes. First, k-means clustering was applied to obtain representative frames (keyframes) from action videos, effectively reducing redundancy and computational complexity. Next, a convolutional neural network (CNN) model, AlexNet, was fine-tuned using 12 movie datasets. The selected representative action frames were then fed into a CNN classifier for final prediction. Experimental evaluation on the Hollywood2 dataset demonstrated superior accuracy compared to state-of-the-art feature extraction-based action recognition methods.