Multi-Duration Saliency logoMulti-Duration Saliency

We introduce the concept of multi-duration saliency, which captures multiple attention snapshots corresponding to different viewing durations. We leverage an efficient crowdsourcing methodology to collect CodeCharts1K, a datasets of 1000 images with saliency heatmaps at three viewing durations. Our multi-duration saliency model, MD-SEM, takes in an image as input and predicts three distinct saliency maps, one per viewing duration. Our lightweight model outperforms baseline models trained to predict multiple durations. Finally, we show that the predicted maps can be used as input into applications such as cropping, rendering, and captioning.


How much time do you have? Modeling multi-duration saliency

Camilo Fosco*, Anelise Newman*, Pat Sukhum, Yun Bin Zhang, Nanxuan Zhao, Aude Oliva, and Zoya Bylinskii

Abstract. What jumps out in a single glance of an image is different than what you might notice after closer inspection. Yet conventional models of visual saliency produce predictions at an arbitrary, fixed viewing duration, offering a limited view of the rich interactions between image content and gaze location. In this paper we propose to capture gaze as a series of snapshots, by generating population-level saliency heatmaps for multiple viewing durations. We collect the CodeCharts1K dataset, which contains multiple distinct heatmaps per image corresponding to 0.5, 3, and 5 seconds of free-viewing. We develop an LSTM-based model of saliency that simultaneously trains on data from multiple viewing durations. Our Multi-Duration Saliency Excited Model (MD-SEM) achieves competitive performance on the LSUN 2017 Challenge with 57% fewer parameters than comparable architectures. It is the first model that produces heatmaps at multiple viewing durations, enabling applications where multi-duration saliency can be used to prioritize visual content to keep, transmit, and render.
Paper image

Example multiduration predictions from our Multi-Duration Saliency Excited Model (MD-SEM).


CodeCharts1K is the first multi-duration saliency dataset. It contains 1000 images from a variety of datasets, with saliency heatmaps corresponding to 0.5, 3, and 5 seconds of viewing. We used the CodeCharts interface to crowdsource our multiduration data. The CodeCharts1K dataset and the CodeCharts interface are made available.
Dataset image

Example multi-duration saliency heatmaps from the CodeCharts1K dataset, featuring images taken from a variety of datasets.


We provide code for training and evaluating our model. Our pretrained model weights are available for download.
Code/Models image

Architecture for our Multi-Duration Saliency Excited Model (MD-SEM).


Multiduration saliency can add extra temporal context to saliency-based applications like cropping, compression/rendering, and captioning. We publish code to demo these applications based on our multi-duration predictions.
Applications image

Automatic crops based on multi-duration saliency predictions produced by our model.


Publicity image