Understand video like humans do
Thus, understanding the context of appearance becomes key in Brand Safety and this explains why video understanding automation is the ultimate tool for media stakeholders. A fake terrorist in a comedy could easily dress exactly like a reel one but would this video content be considered as apology for terrorism? Whether nudity is displayed in a serious documentary or an erotic movie can completely change our perception of nudity and its severeness. As for weapons, footage of a school shooting could not be put at the same level as displaying military troops equipped with guns marching during a parade. It’s all about context of appearance.
This is where the game becomes tricky. As of today, some companies do a fairly good job using A.I to analyze semantic, image or sound and get a fair assessment of the context. Video is the next level as it requires combining many various kinds of learnings, from the easiest ones (a plastic gun used by a kid is not a real gun) to the toughest cases (one could as well drink whisky in a water glass and water in a wine glass, how to make the difference?).
Let us take a single example of textbook Brand Safety to understand issues that are raised with video and context : promotion and advocacy of alcohol use. Display of alcohol bottles and products are rather easy to identify. Most times, shapes, logos, and colors really help the algorithm detect what is alcohol and what is not. But when it comes to drinking in other containers, behavioral understanding is necessary : drinking shots at a party, playing a beer game on a ping-pong table, or simply pouring translucid alcohol in a plain glass are as many examples as possible that require understanding that context is most likely to be displaying alcohol. Identifying someone being drunk without even identifying alcohol creates new difficulties, based on body movement, facial expressions, voice, overall context.
In the end, it all comes down to levels of understanding, as we humans do. Eventually, algorithms can help brands make assumptions on the context of a video, thus helping them assess the true necessity to reject a video.