During the global pandemic, most of us are living out more of our lives online. This includes children who may be taking remote courses for school. Imagine your 7-year-old loads a web page for a class assignment, but instead of an alphabet tutorial, the site loads an inappropriate video. Do you know that real-life people, known as “content moderators,” work behind the scenes to prevent this from happening? Due to the nature of their job, these individuals are at far greater risk for developing Post-traumatic Stress Disorder (PTSD) and other difficult-to-treat psychological disorders. How can we add a layer of protection that would keep content moderators from being exposed to so many disturbing, graphic images? Enter Fast Insights: Video-to-Text Summarizer, which converts video to human-readable text.
According to an article entitled “The Trauma Floor: The Secret Lives of Facebook Moderators in America,” published by theverge.com, many content moderators suffer from PTSD. Most of the parent companies do offer them psychological counseling resources. However, the damage cannot be completely reversed. As per the recent article from Financial Times, the third-party content moderators for Facebook are required to sign a form explicitly acknowledging that their job could cause PTSD. Furthermore, content moderators are asked to sign nondisclosure agreements which restricts them from sharing their trauma, even with their loved ones. Considering that these content moderators cannot fully recover from PTSD, we can at least propose to have preventive measures to alert them about the graphic content in order to reduce the impact and damage, as much as possible.
Our solution, Fast Insights, is a video-to-text summarizer engine that compiles a long-chronicled video into a short, manageable text summary by leveraging deep artificial intelligence, natural language processing, computer vision, and distributed infrastructure. The engine shortens the time spent on analyzing videos by converting the video content into human-readable text and generating a ready-to-use analysis for a variety of use-cases such as flagging spam, adult, or other kinds of violative content. It also provides the timestamps between which such content appears in the video. Content moderators can quickly view just the highlighted duration of the video and act accordingly. This way, they will be better prepared about what to expect and that would considerably reduce their anxiety and trauma.
While Fast Insights will certainly help content moderators, there are many other applications that we can think of. Distance learning can be made easy by converting video lessons to notes so students can focus their attention on understanding the concepts instead of taking down notes. Surveillance teams can read the text summary of the real-time footage from surveillance cameras to monitor any suspicious activity, rather than going through the entire video. Censor boards can rate movies based on text summaries to detect questionable content before publishing them to websites/forums. Forest rangers can monitor poaching of animals and illegal wildlife trade by reading the text summary of CCTV footage. And the list goes on.
When our team began brainstorming on what topic to pick for the Nittany AI Challenge, we knew that our idea needed to be impactful and have applications that are not limited to a particular domain. The concept of video-to-text summarization is not common and many resources available now are mostly prototypes and still in their infancy. We are committed to developing a complete product for a whole new market of video content management, and thus, hope to make a positive impact on our society.
This project would not have been possible without the support and encouragement of our professors at The Pennsylvania State University. We would like to extend our special gratitude to Dr. Youakim Badr, our mentor, for his guidance and invaluable feedback. We are very excited to work together and drive the project to completion!