Identification of Harmful Video Content Using Movie Trailers and Machine Learning
Every content has a criterion of appropriate age and maturity. Imagine, what if a kid gets hold of a video showing violent death or accidents? It would ruin the child’s mental state and bring harm to his psychology. So, these kinds of contents must be kept away from their reach. A research paper from the Swedish Media Council has brought up a new approach to the automatic identification of harmful content, by considering an audio and video content separately, and using human-annotated data as a guiding index for material that may disturb the viewers.The paper also talks about the need for machine learning systems to take account of the entire context of a scene, and illustrate the many ways that innocuous content such as humorous or satirical could be misinterpreted as harmful in a less sophisticated and multimodal approach to the video analysis.
The observations of research
The researchers have noted that useful developments in this sector have been impeded by the copyright protection of motion pictures, which makes the creation of generalized open source datasets problematic. They have also observed that to date, similar experiments have suffered from a sparsity of labels for full-length movies, which has led to prior work oversimplifying the contributing data, or keying on only one aspect of the data, such as dominant colors or dialogue analysis.
The ‘harmful’ content
Under the Swedish system of film classification, ‘harmful’ content is defined based on its possible propensity to produce feelings of anxiety, fear, and other negative effects in children. The researchers have noted that since this ratings system involves as much intuition and instinct as science, the parameters for the definition of ‘harmful content’ are difficult to quantize and instill into an automated system. The paper further observes that earlier machine learning and algorithmic systems addressing this challenge have used specific facet detection as a criterion. It includes the visual detection of blood and flames, the sound of bursting, and the frequency of shot length, among other restricted definitions of harmful content. The current multi-domain approach seems likely to offer a better methodology for automatic rating of harmful content.
The technology behind the research
The Swedish researchers have trained an 8×8 50-layered neural network model on the Kinetics-400 human movement benchmark dataset, and created an architecture designed to fuse video and audio predictions. In effect to this, the use of trailers solves three problems for the creation of a dataset of this nature. It obviates copyright issues of increased turbulence and higher shot frequency of trailers. It allows for a greater frequency of annotation; and it ensures that the low incidence of violent or disturbing content in an entire movie does not unbalance the dataset and accidentally class it as suitable for children.