A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit the original URL.
The file type is application/pdf
.
Filters
Exploring multi-modality structure for cross domain adaptation in video concept annotation
2012
Neurocomputing
classifiers in the source domains to assist multi-graph optimization (a graph-based semi-supervised learning method) in the target domain for video concept annotation. ...
Domain adaptive video concept detection and annotation has recently received significant attention, but in existing video adaptation processes, all the features are treated as one modality, while multimodalities ...
Fig. 1 . 1 An example for multi-modalities based domain adaptive video concept detection and annotation in one-to-one adaptation case. ...
doi:10.1016/j.neucom.2011.05.041
fatcat:dog64kb5qvafxfxzgs6f2y3weq
Survey: Transformer based Video-Language Pre-training
[article]
2021
arXiv
pre-print
Next, we categorize transformer models into Single-Stream and Multi-Stream structures, highlight their innovations and compare their performances. ...
We then describe the typical paradigm of pre-training & fine-tuning on Video-Language processing in terms of proxy tasks, downstream tasks and commonly used video datasets. ...
It follows the single-stream structure, porting the original BERT structure to the multi-modal domain as illustrated in Fig. 2-(1) . ...
arXiv:2109.09920v1
fatcat:ixysz5k4vrbktmf6cqftttls7m
Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
[article]
2021
arXiv
pre-print
Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. ...
domain. ...
statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in ...
arXiv:2107.13782v2
fatcat:s4spofwxjndb7leqbcqnwbifq4
Extracting Semantics from Multimedia Content: Challenges and Solutions
[chapter]
2008
Signals and Communication Technology
correspondence across modalities, learning structured (generative) models to account for natural data dependency or model hidden topics, handling rare classes, leveraging unlabeled data, scaling to large ...
Multimedia content accounts for over 60% of traffic in the current internet [74] . ...
Cross-modal association and image annotation Viewing semantic concepts as binary detection on low-level multi-modal features is not the only way for multimedia semantics extraction. ...
doi:10.1007/978-0-387-76569-3_2
fatcat:jul6fw7esfaurct6erjnvpcq6q
Multi-modal Deep Analysis for Multimedia
2019
IEEE transactions on circuits and systems for video technology (Print)
In this article, we present a deep and comprehensive overview for multi-modal analysis in multimedia. ...
These data are heterogeneous and multi-modal in nature, imposing great challenges for processing and analyzing them. ...
ACKNOWLEDGMENT We thank Guohao Li, Shengze Yu and Yitian Yuan for providing relevant materials and valuable opinions. This work will never be accomplished without their useful suggestions. ...
doi:10.1109/tcsvt.2019.2940647
fatcat:l4tchrkgrnaeradvc4nhfan2w4
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
[article]
2024
arXiv
pre-print
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. ...
In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal ...
Third, exploring unique pre-training paradigms suited for (MM)KGs to fully harness the value of structured knowledge in multi-modal pre-training. ...
arXiv:2402.05391v3
fatcat:vbrdoocaozfptlwbdf77irxame
Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities
[article]
2024
arXiv
pre-print
Hence many researchers have developed automatic techniques for detecting possible cross-modal discordance in web-based content. ...
We analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new research opportunities in the field of multi-modal misinformation detection ...
In the next section, we introduce some of cross-modal clues for detecting them in multi-modal settings. ...
arXiv:2203.13883v6
fatcat:ppocrlukmnbhrggs6xui4p2vgi
A Survey on Video Moment Localization
2022
ACM Computing Surveys
In addition, we discuss promising future directions for this field, in particular large-scale datasets and interpretable video moment localization models. ...
We also review the datasets available for video moment localization and group results of related work. ...
[31] for the dense video captioning task, which contains annotations from the open domain. ...
doi:10.1145/3556537
fatcat:3s6cqyebnjfg3pvwvk3db7d6ra
Video Unsupervised Domain Adaptation with Deep Learning: A Comprehensive Survey
[article]
2022
arXiv
pre-print
To tackle performance degradation and address concerns in high video annotation cost uniformly, the video unsupervised domain adaptation (VUDA) is introduced to adapt video models from the labeled source ...
Further, with the high cost of video annotation, it is more practical to use unlabeled videos for training. ...
We summarize the recent progress in VUDA research while providing insights into future VUDA research from the perspectives of leveraging multi-modal information, investigating reconstruction-based methods ...
arXiv:2211.10412v2
fatcat:zatg7vnuunboxlxnm3he62yq2q
Leveraging Motion Priors in Videos for Improving Human Segmentation
[article]
2018
arXiv
pre-print
In this work, we propose to leverage "motion prior" in videos for improving human segmentation in a weakly-supervised active learning setting. ...
approach with domain adaptation approaches. ...
To demonstrate severe domain shift, we evaluate our method mainly on cross-modality (RGB to IR) domain adaptation for human segmentation. ...
arXiv:1807.11436v1
fatcat:jdiwvoqas5hkhohburdpw2wegu
Machine Learning Techniques for Sensor-based Human Activity Recognition with Data Heterogeneity – A Review
[article]
2024
arXiv
pre-print
Addressing data heterogeneity issues can improve performance, reduce computational costs, and aid in developing personalized, adaptive models with less annotated data. ...
Sensor-based Human Activity Recognition (HAR) is crucial in ubiquitous computing, analysing behaviours through multi-dimensional observations. ...
[10] adapted the non-local block for heterogeneous data fusion of video and smart-glove. ...
arXiv:2403.15422v1
fatcat:f5c5frgvavdc7mwuhfoic3aguy
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
[article]
2022
arXiv
pre-print
cross-modal semantic interaction. ...
We conclude this paper by putting forward some open issues and promising research trends for VAD, e.g., the cognitive mechanisms of human-machine dialogue under cross-modal dialogue context, and knowledge-enhanced ...
ACKNOWLEDGMENTS This work was partially supported by the National Science Fund for Distinguished Young Scholars (62025205), and the National Natural Science Foundation of China (No. 62032020, 61960206008 ...
arXiv:2207.00782v1
fatcat:a57laj75xfa43gg4hjvxdh4c4i
Cross-media cloud computing
2012
2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems
In cognitive modeling, a new model entitled CAM is proposed which is suitable for cross-media semantic understanding. Moreover, a Cross-Media Intelligent Retrieval System (CMIRS) will be illustrated. ...
Furthermore, we propose a framework for cross-media semantic understanding which contains discriminative modeling, generative modeling and cognitive modeling. ...
The concepts provide model entities of interest in the domain, and are typically organized into a taxonomy tree where each node represents a concept and each concept is a specialization of its parent. ...
doi:10.1109/ccis.2012.6664429
dblp:conf/ccis/ShiJZYZ12
fatcat:wefyscguf5bmjj2uzjafdjc2ty
Transformer for Object Re-Identification: A Survey
[article]
2024
arXiv
pre-print
In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by ...
the Transformer in addressing a multitude of challenges across these domains. ...
Consequently, the Transformer is particularly wellsuited for establishing inter-modal associations and facilitating the fusion of multi-modal information in cross-modal Re-ID tasks. 4) Transformer in Special ...
arXiv:2401.06960v1
fatcat:t5gsdon4grh55kkkzo3tnkdb5e
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
[article]
2022
arXiv
pre-print
To fill this gap, we construct the Tencent 'Ads Video Segmentation'~(TAVS) dataset in the ads domain to escalate multi-modal video analysis to a new level. ...
TAVS is organized hierarchically in semantic aspects for comprehensive temporal video segmentation with three levels of categories for multi-label classification, e.g., 'place' - 'working place' - 'office ...
To explore a good representative domain for such high-level intelligence, we believe that ads video, as its properties of rich plot developments and multi-modal nature are just like a 'short movie', is ...
arXiv:2212.04700v1
fatcat:hllj75yoejh5rk7mdefvaa4yf4
« Previous
Showing results 1 — 15 out of 7,368 results