Feedback Mechanism and Public Speaking using Audio and Video Analysis

Nagaraj A; Srushti K S; Sushmarani S; Sanghvi V

doi:10.17148/IARJSET.2025.121207

← Back to VOLUME 12, ISSUE 12, DECEMBER 2025

Feedback Mechanism and Public Speaking using Audio and Video Analysis

Mr. Nagaraj A, Srushti K S, Sushmarani S, Sanghvi V

Downloads: Download PDF|DOI: 10.17148/IARJSET.2025.121207

👁 12 views📥 0 downloads

Abstract: This project presents an advanced real-time feedback system designed to elevate public speaking skills by performing an integrated audio-visual analysis through webcam input. The system intelligently interprets key non-verbal cues such as posture alignment, gesture consistency, facial orientation, and eye-contact patterns while simultaneously assessing crucial speech metrics including filler-word frequency, speaking speed, articulation clarity, and vocal modulation. By providing immediate, data-driven feedback and structured progress summaries, users can steadily refine their communication style and presentation effectiveness. The platform is developed using Streamlit for a smooth and interactive interface, supported by a robust backend that integrates Convolutional Neural Networks (CNNs) for body-language assessment, Hugging Face NLP models for speech interpretation, and Librosa for comprehensive audio feature extraction. Trained on a diverse collection of annotated public speaking recordings, the system delivers reliable and context-aware insights while upholding strict standards of data privacy and ethical compliance. Extensive evaluations confirm its accuracy, responsiveness, and adaptability. With continuous enhancements guided by real user feedback, this AI-powered solution makes professional grade public speaking training more accessible, scalable, and personalized for learners across all backgrounds.

Keywords: Public speaking, real-time feedback, body language, speech analysis, CNN, Hugging Face, Librosa, NLP, audio-visual processing, feature extraction, user interface, Streamlit, Tkinter, machine learning, deep learning, emotion detection, posture, gestures, eye contact, and filler word.

How to Cite:

[1] Mr. Nagaraj A, Srushti K S, Sushmarani S, Sanghvi V, “Feedback Mechanism and Public Speaking using Audio and Video Analysis,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2025.121207

This work is licensed under a Creative Commons Attribution 4.0 International License.