A man in silhouette holding a smartphone

Methods innovation

Human communication and culture are multimodal, consisting of language, images, video, music and sounds which combine in digital media in discernible patterns.

However, computational tools are individually designed for language processing, image processing, video processing, and audio analysis, rather than multimodal models of communication. Therefore, computational techniques are successful in domains where choices are constrained (e.g. simple exchanges of information, object detection, football match analysis, music genres).

Despite recent advances in machine learning and big data technologies, computational models also lack sophisticated methods for aggregating text, image, sound and video data and extracting information encoded in these sources at scale to identify key patterns.

To address these issues, DMSI aims to develop a new approach to digital media communications using multimodal frameworks which describe how language, images and other resources are organised as tools for human use. In this approach, linguistic, visual and aural resources are viewed as (a) systems with an underlying architecture or organisation, and (b) texts where choices from these systems are used to construct various points of view and social relations.

Such an approach provides models of language, images, music and other resources which can be integrated with methods from media studies, cultural theory and sociology, providing a comprehensive framework for computational analysis that takes context into account. This approach to Explainable AI is possible given that contextual information is available on a scale which has never existed before.

We work closely with researchers in the Digital Innovation Facility (DIF).

Back to: Digital Media & Society Institute