[Journal of Artificial Intelligence Humanities Vol.2] Multimodal Sparse Representation Learning and Applications_Miriam Cha., Youngjune L. Gwon., H.T. Kung

Chung-Ang University
Humanities Research Institute
HK+ Artificial Intelligence Humanities

eISSN: 2951-388X

Print ISSN: 2635-4691 / Online ISSN: 2951-388X


Title	[Journal of Artificial Intelligence Humanities Vol.2] Multimodal Sparse Representation Learning and Applications_Miriam Cha., Youngjune L. Gwon., H.T. Kung2019-01-17 09:35
Writer	aihadmin
Attachment	Multimodal Sparse Representation Learning and Applications_Miriam Cha.pdf (345.9KB)
Multimodal Sparse Representation Learning and Applications Miriam Cha(Graduate student, Harvard University) Youngjune L. Gwon(Graduate student, Harvard University) H.T. Kung(Professor of Computer Science and Electrical Engineering, Harvard University) Sparse coding has been applied successfully to single-modality scenarios. We consider a sparse coding framework for multimodal representation learning. Our framework aims to capture semantic correlation between different data types via joint sparse coding. Such joint optimization induces a unified representation that is sparse and shared across modalities. In particular, we compute joint, cross-modal, and stacked cross-modal sparse codes. We find that these representations are robust to noise and provide greater flexibility in modeling features for multimodal input. A good multimodal framework should be able to fill in missing modality given the other and improve represen- tational efficiency. We demonstrate missing modality case through image denoising and indicate effectiveness of cross-modal sparse code in uncovering the relation of the clean-corrupted image pairs. Furthermore, we experiment with multi-layer sparse coding to learn highly nonlinear relationship. The effectiveness of our approach is also demonstrated in the multimedia event detection and retrieval on the TRECVID dataset (audio-video), category classification on the Wikipedia dataset (image-text), and sentiment classification on PhotoTweet (image-text). Key words：Multimodal learning, multimedia, visual-text, audio-video, sparse coding

Prev	[Journal of Artificial Intelligence Humanities Vol.2] Can practical reason be artificial?_Dieter Schönecker	aihadmin	2019-01-17
-	[Journal of Artificial Intelligence Humanities Vol.2] Multimodal Sparse Representation Learning and Applications_Miriam Cha., Youngjune L. Gwon., H.T. Kung	aihadmin	2019-01-17
Next	[Journal of Artificial Intelligence Humanities Vol.2] Sympathetic Magic in A.I. and the Humanities_Tony Veale	aihadmin	2019-01-17