Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








7,368 Hits in 6.2 sec

Exploring multi-modality structure for cross domain adaptation in video concept annotation

Shaoxi Xu, Sheng Tang, Yongdong Zhang, Jintao Li, Yan-Tao Zheng
2012 Neurocomputing  
classifiers in the source domains to assist multi-graph optimization (a graph-based semi-supervised learning method) in the target domain for video concept annotation.  ...  Domain adaptive video concept detection and annotation has recently received significant attention, but in existing video adaptation processes, all the features are treated as one modality, while multimodalities  ...  Fig. 1 . 1 An example for multi-modalities based domain adaptive video concept detection and annotation in one-to-one adaptation case.  ... 
doi:10.1016/j.neucom.2011.05.041 fatcat:dog64kb5qvafxfxzgs6f2y3weq

Survey: Transformer based Video-Language Pre-training [article]

Ludan Ruan, Qin Jin
2021 arXiv   pre-print
Next, we categorize transformer models into Single-Stream and Multi-Stream structures, highlight their innovations and compare their performances.  ...  We then describe the typical paradigm of pre-training & fine-tuning on Video-Language processing in terms of proxy tasks, downstream tasks and commonly used video datasets.  ...  It follows the single-stream structure, porting the original BERT structure to the multi-modal domain as illustrated in Fig. 2-(1) .  ... 
arXiv:2109.09920v1 fatcat:ixysz5k4vrbktmf6cqftttls7m

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems.  ...  domain.  ...  statement Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

Extracting Semantics from Multimedia Content: Challenges and Solutions [chapter]

Lexing Xie, Rong Yan
2008 Signals and Communication Technology  
correspondence across modalities, learning structured (generative) models to account for natural data dependency or model hidden topics, handling rare classes, leveraging unlabeled data, scaling to large  ...  Multimedia content accounts for over 60% of traffic in the current internet [74] .  ...  Cross-modal association and image annotation Viewing semantic concepts as binary detection on low-level multi-modal features is not the only way for multimedia semantics extraction.  ... 
doi:10.1007/978-0-387-76569-3_2 fatcat:jul6fw7esfaurct6erjnvpcq6q

Multi-modal Deep Analysis for Multimedia

Wenwu Zhu, Xin Wang, Hongzhi Li
2019 IEEE transactions on circuits and systems for video technology (Print)  
In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia.  ...  These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them.  ...  ACKNOWLEDGMENT We thank Guohao Li, Shengze Yu and Yitian Yuan for providing relevant materials and valuable opinions. This work will never be accomplished without their useful suggestions.  ... 
doi:10.1109/tcsvt.2019.2940647 fatcat:l4tchrkgrnaeradvc4nhfan2w4

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [article]

Zhuo Chen, Yichi Zhang, Yin Fang, Yuxia Geng, Lingbing Guo, Xiang Chen, Qian Li, Wen Zhang, Jiaoyan Chen, Yushan Zhu, Jiaqi Li, Xiaoze Liu (+3 others)
2024 arXiv   pre-print
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation.  ...  In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal  ...  Third, exploring unique pre-training paradigms suited for (MM)KGs to fully harness the value of structured knowledge in multi-modal pre-training.  ... 
arXiv:2402.05391v3 fatcat:vbrdoocaozfptlwbdf77irxame

Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities [article]

Sara Abdali, Sina shaham, Bhaskar Krishnamachari
2024 arXiv   pre-print
Hence many researchers have developed automatic techniques for detecting possible cross-modal discordance in web-based content.  ...  We analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new research opportunities in the field of multi-modal misinformation detection  ...  In the next section, we introduce some of cross-modal clues for detecting them in multi-modal settings.  ... 
arXiv:2203.13883v6 fatcat:ppocrlukmnbhrggs6xui4p2vgi

A Survey on Video Moment Localization

Meng Liu, Liqiang Nie, Yunxiao Wang, Meng Wang, Yong Rui
2022 ACM Computing Surveys  
In addition, we discuss promising future directions for this field, in particular large-scale datasets and interpretable video moment localization models.  ...  We also review the datasets available for video moment localization and group results of related work.  ...  [31] for the dense video captioning task, which contains annotations from the open domain.  ... 
doi:10.1145/3556537 fatcat:3s6cqyebnjfg3pvwvk3db7d6ra

Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey [article]

Yuecong Xu, Haozhi Cao, Zhenghua Chen, Xiaoli Li, Lihua Xie, Jianfei Yang
2022 arXiv   pre-print
To tackle performance degradation and address concerns in high video annotation cost uniformly, the video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source  ...  Further, with the high cost of video annotation, it is more practical to use unlabeled videos for training.  ...  We summarize the recent progress in VUDA research while providing insights into future VUDA research from the perspectives of leveraging multi-modal information, investigating reconstruction-based methods  ... 
arXiv:2211.10412v2 fatcat:zatg7vnuunboxlxnm3he62yq2q

Leveraging Motion Priors in Videos for Improving Human Segmentation [article]

Yu-Ting Chen, Wen-Yen Chang, Hai-Lun Lu, Tingfan Wu, Min Sun
2018 arXiv   pre-print
In this work, we propose to leverage "motion prior" in videos for improving human segmentation in a weakly-supervised active learning setting.  ...  approach with domain adaptation approaches.  ...  To demonstrate severe domain shift, we evaluate our method mainly on cross-modality (RGB to IR) domain adaptation for human segmentation.  ... 
arXiv:1807.11436v1 fatcat:jdiwvoqas5hkhohburdpw2wegu

Machine Learning Techniques for Sensor-based Human Activity Recognition with Data Heterogeneity – A Review [article]

Xiaozhou Ye, Kouichi Sakurai, Nirmal Nair, Kevin I-Kai Wang
2024 arXiv   pre-print
Addressing data heterogeneity issues can improve performance, reduce computational costs, and aid in developing personalized, adaptive models with less annotated data.  ...  Sensor-based Human Activity Recognition (HAR) is crucial in ubiquitous computing, analysing behaviours through multi-dimensional observations.  ...  [10] adapted the non-local block for heterogeneous data fusion of video and smart-glove.  ... 
arXiv:2403.15422v1 fatcat:f5c5frgvavdc7mwuhfoic3aguy

Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review [article]

Hao Wang, Bin Guo, Yating Zeng, Yasan Ding, Chen Qiu, Ying Zhang, Lina Yao, Zhiwen Yu
2022 arXiv   pre-print
cross-modal semantic interaction.  ...  We conclude this paper by putting forward some open issues and promising research trends for VAD, e.g., the cognitive mechanisms of human-machine dialogue under cross-modal dialogue context, and knowledge-enhanced  ...  ACKNOWLEDGMENTS This work was partially supported by the National Science Fund for Distinguished Young Scholars (62025205), and the National Natural Science Foundation of China (No. 62032020, 61960206008  ... 
arXiv:2207.00782v1 fatcat:a57laj75xfa43gg4hjvxdh4c4i

Cross-media cloud computing

Zhongzhi Shi, Guang Jiang, Bo Zhang, Jinpeng Yue, Xiaofei Zhao
2012 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems  
In cognitive modeling, a new model entitled CAM is proposed which is suitable for cross-media semantic understanding. Moreover, a Cross-Media Intelligent Retrieval System (CMIRS) will be illustrated.  ...  Furthermore, we propose a framework for cross-media semantic understanding which contains discriminative modeling, generative modeling and cognitive modeling.  ...  The concepts provide model entities of interest in the domain, and are typically organized into a taxonomy tree where each node represents a concept and each concept is a specialization of its parent.  ... 
doi:10.1109/ccis.2012.6664429 dblp:conf/ccis/ShiJZYZ12 fatcat:wefyscguf5bmjj2uzjafdjc2ty

Transformer for Object Re-Identification: A Survey [article]

Mang Ye, Shuoyi Chen, Chenyue Li, Wei-Shi Zheng, David Crandall, Bo Du
2024 arXiv   pre-print
In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by  ...  the Transformer in addressing a multitude of challenges across these domains.  ...  Consequently, the Transformer is particularly wellsuited for establishing inter-modal associations and facilitating the fusion of multi-modal information in cross-modal Re-ID tasks. 4) Transformer in Special  ... 
arXiv:2401.06960v1 fatcat:t5gsdon4grh55kkkzo3tnkdb5e

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation [article]

Jie Jiang, Zhimin Li, Jiangfeng Xiong, Rongwei Quan, Qinglin Lu, Wei Liu
2022 arXiv   pre-print
To fill this gap, we construct the Tencent 'Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level.  ...  TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., 'place' - 'working place' - 'office  ...  To explore a good representative domain for such high-level intelligence, we believe that ads video, as its properties of rich plot developments and multi-modal nature are just like a 'short movie', is  ... 
arXiv:2212.04700v1 fatcat:hllj75yoejh5rk7mdefvaa4yf4
« Previous Showing results 1 — 15 out of 7,368 results