Loading...

Multi-modal deep distance metric learning

Roostaiyan, S. M ; Sharif University of Technology | 2017

625 Viewed
  1. Type of Document: Article
  2. DOI: 10.3233/IDA-163196
  3. Publisher: IOS Press , 2017
  4. Abstract:
  5. In many real-world applications, data contain heterogeneous input modalities (e.g., web pages include images, text, etc.). Moreover, data such as images are usually described using different views (i.e. different sets of features). Learning a distance metric or similarity measure that originates from all input modalities or views is essential for many tasks such as content-based retrieval ones. In these cases, similar and dissimilar pairs of data can be used to find a better representation of data in which similarity and dissimilarity constraints are better satisfied. In this paper, we incorporate supervision in the form of pairwise similarity and/or dissimilarity constraints into multi-modal deep networks to combine different modalities into a shared latent space. Using properties of multi-modal data, we design multi-modal deep networks and propose a pre-training algorithm for these networks. In fact, the proposed network has the ability of learning intra- and inter-modal high-order statistics from raw features and we control its high flexibility via an efficient multi-stage pre-training phase corresponding to properties of multi-modal data. Experimental results show that the proposed method outperforms recent methods on image retrieval tasks. © 2017 - IOS Press and the authors. All rights reserved
  6. Keywords:
  7. Deep networks ; Multi-modal data ; Pre-training ; Content based retrieval ; Modal analysis ; Websites ; Distance Metric Learning ; Distance metrics ; High order statistics ; Metric learning ; Similar-dissimilar pairs ; Similarity measure ; Image retrieval
  8. Source: Intelligent Data Analysis ; Volume 21, Issue 6 , 2017 , Pages 1351-1369 ; 1088467X (ISSN)
  9. URL: https://content.iospress.com/articles/intelligent-data-analysis/ida163196