[Daniel Geng] and others have an interesting system of generating multi-view optical illusions, or visual anagrams. Such images have more than one “correct” view and visual interpretation. What’s more ...
With the emergence of huge amounts of heterogeneous multi-modal data, including images, videos, texts/languages, audios, and multi-sensor data, deep learning-based methods have shown promising ...