Multi-Duration Saliency

Paper

How much time do you have? Modeling multi-duration saliency

Camilo Fosco, Anelise Newman, Pat Sukhum, Yun Bin Zhang, Nanxuan Zhao, Aude Oliva, and Zoya Bylinskii

Abstract. What jumps out in a single glance of an image is different than what you might notice after closer inspection. Yet conventional models of visual saliency produce predictions at an arbitrary, fixed viewing duration, offering a limited view of the rich interactions between image content and gaze location. In this paper we propose to capture gaze as a series of snapshots, by generating population-level saliency heatmaps for multiple viewing durations. We collect the CodeCharts1K dataset, which contains multiple distinct heatmaps per image corresponding to 0.5, 3, and 5 seconds of free-viewing. We develop an LSTM-based model of saliency that simultaneously trains on data from multiple viewing durations. Our Multi-Duration Saliency Excited Model (MD-SEM) achieves competitive performance on the LSUN 2017 Challenge with 57% fewer parameters than comparable architectures. It is the first model that produces heatmaps at multiple viewing durations, enabling applications where multi-duration saliency can be used to prioritize visual content to keep, transmit, and render.

Paper

Talk

Poster

Example multiduration predictions from our Multi-Duration Saliency Excited Model (MD-SEM).

Dataset

CodeCharts1K is the first multi-duration saliency dataset. It contains 1000 images from a variety of datasets, with saliency heatmaps corresponding to 0.5, 3, and 5 seconds of viewing. We used the CodeCharts interface to crowdsource our multiduration data. The CodeCharts1K dataset and the CodeCharts interface are made available.

Dataset
Download

CodeCharts
Interface

Example multi-duration saliency heatmaps from the CodeCharts1K dataset, featuring images taken from a variety of datasets.

Code/Models

We provide code for training and evaluating our model. Our pretrained model weights are available for download.

GitHub

Pretrained Models

Architecture for our Multi-Duration Saliency Excited Model (MD-SEM).

Applications

Multiduration saliency can add extra temporal context to saliency-based applications like cropping, compression/rendering, and captioning. We publish code to demo these applications based on our multi-duration predictions.