Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Filters








66,850 Hits in 2.5 sec

Visually Exploring Multi-Purpose Audio Data [article]

David Heise, Helen L. Bear
2021 arXiv   pre-print
We analyse multi-purpose audio using tools to visualise similarities within the data that may be observed via unsupervised methods.  ...  We use the visual assessment of cluster tendency (VAT) technique on a well known data set to observe how the samples naturally cluster, and we make comparisons to the labels used for audio geotagging and  ...  Fig. 2 . 2 A series of ordered dissimilarity matrices produced by VAT and SpecVAT on multi-purpose audio data.  ... 
arXiv:2110.04584v1 fatcat:aa6apxylabdirmifi4luxttd4i

Learning in Audio-visual Context: A Review, Analysis, and New Perspective [article]

Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
2022 arXiv   pre-print
Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies.  ...  Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration.  ...  Hence, audio-visual learning is essential to our pursuit of human-like machine perception ability. Its purpose is to explore computational approaches that learn from both audio-visual data.  ... 
arXiv:2208.09579v1 fatcat:xrjedf2ezbhbzbkysw2z2jsm7e

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

Jordan J. Bird, Diego R. Faria, Cristiano Premebida, Aniko Ekart, George Vogiatzis
2020 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)  
This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips  ...  Both are examples which are correctly classified by our multi-modality approach.  ...  We explore visual and audio in this experiment due to accessibility, since there is a lot of audio-visual video data available to researchers.  ... 
doi:10.1109/iros45743.2020.9341557 fatcat:yusylnuo2nfthag7ipnezfjgcu

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines [article]

Jordan J. Bird, Diego R. Faria, Cristiano Premebida, Anikó Ekárt, George Vogiatzis
2020 arXiv   pre-print
The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion.  ...  This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips  ...  We explore visual and audio in this experiment due to accessibility, since there is a lot of audio-visual video data available to researchers.  ... 
arXiv:2007.10175v1 fatcat:wsufgbkxbfhujb2kra7zmb3yky

Cloud Bridge: A Data-Driven Immersive Audio-Visual Software Interface

Qian Liu, Yoon Chung Han, JoAnn Kuchera-Morin, Matthew Wright
2013 Zenodo  
It explores how information can besonified and visualized to facilitate findings, and eventually becomeinteractive musical compositions. Cloud Bridge functions as a multi-user,multimodal instrument.  ...  Cloud Bridge leads to a new media interactiveinterface utilizing audio synthesis, visualization and real-time interaction.  ...  This project is a proof of concept for an interactive multi-user software instrument that utilizes data as the driver for visual/audio content.  ... 
doi:10.5281/zenodo.1178596 fatcat:zxkacyryujawxohsn5chmsg3sy

Mediation Exploring Multi-Sensory Elements Through the Use of Songs and its Effects to Pupils with Learning Disabilities

Mohd Razimi Husin, Nadiah Yan Binti Abdullah
2019 International Journal of Academic Research in Progressive Education and Development  
The purpose of this study is to explore multi-sensory elements through songs in learning and the impact on pupils with learning disabilities (PLD).  ...  Data of the study were analyzed by using constant comparison techniques of multi-sensory elements, the use of songs and the effects on pupils using the Nvivo software.  ...  DATA FREQUENCY Total Observations (Recorded) Interviews (Triangulatio n) P1 P2 P3 P1 P2 P3 Perceptions Audio Songs (EA) S1 / / / 3 S2 / / / 3 Visual Track (EV) S1 / / /  ... 
doi:10.6007/ijarped/v8-i4/6909 fatcat:ntzq56jtyjdcfh7lcaxg4cetky

Improving acoustic event detection using generalizable visual features and multi-modality modeling

Po-Sen Huang, Xiaodan Zhuang, Mark Hasegawa-Johnson
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Hidden Markov models (HMMs) are used for audio-only modeling, and multi-stream HMMs or coupled HMMs (CHMM) are used for audio-visual joint modeling.  ...  To allow the flexibility of audio-visual state asynchrony, we explore effective CHMM training via HMM state-space mapping, parameter tying and different initialization schemes.  ...  Different fusion methods have been explored for the audio and visual modalities.  ... 
doi:10.1109/icassp.2011.5946412 dblp:conf/icassp/HuangZH11 fatcat:tvzxgu5ls5dmjgc2bej3tvqdlq

Sensate abstraction: hybrid strategies for multi-dimensional data in expressive virtual reality contexts

Ruth West, Joachim Gossmann, Todd Margolis, Jurgen P. Schulze, J. P. Lewis, Ben Hackbarth, Iman Mostafavi, Ian E. McDowall, Margaret Dolinsky
2009 The Engineering Reality of Virtual Reality 2009  
The installation utilizes a combination of infrared motion tracking, custom computer vision, multi-channel (10.1) spatialized interactive audio, 3D graphics, data sonification, audio design, networking  ...  Here we describe the physical and audio display systems for the installation and a hybrid strategy for multi-channel spatialized interactive audio rendering in immersive virtual reality that combines amplitude  ...  The installation explores the potential interplay between artistic and data-driven strategies, based on visual and auditory pattern, in working with massive multidimensional multi-scale, multi-resolution  ... 
doi:10.1117/12.806928 fatcat:ssz77u5g2ngo5fx5sq54f4s76i

Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations [article]

Shahram Ghorbani, Yashesh Gaur, Yu Shi, Jinyu Li
2020 arXiv   pre-print
The deliberation model outperforms the multi-stream model and achieves a relative WER improvement of 6% and 8.7% for the clean and masked data, respectively, compared to an audio-only model.  ...  Firstly, we propose a multi-stream attention architecture to leverage signals from both audio and video modalities.  ...  For this purpose, we augment the data by masking out specific words in the audio stream.  ... 
arXiv:2011.04084v1 fatcat:4ibujf7lc5cerf3v2h74rpcfum

Audio-Visual LLM for Video Understanding [article]

Fangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie
2023 arXiv   pre-print
This mechanism is pivotal in enabling end-to-end joint training with video data at different modalities, including visual-only, audio-only, and audio-visual formats.  ...  This dataset allows Audio-Visual LLM to adeptly process a variety of task-oriented video instructions, ranging from multi-turn conversations and audio-visual narratives to complex reasoning tasks.  ...  AudioGPT [33] leverages various audio foundation models to process audio data, where LLMs are regarded as the general-purpose interface.  ... 
arXiv:2312.06720v2 fatcat:vjwjvxyrbvg57eddyjwk7ytvka

CNN-Based Multi-Modal Camera Model Identification on Video Sequences

Davide Dal Cortivo, Sara Mandelli, Paolo Bestagini, Stefano Tubaro
2021 Journal of Imaging  
Differently from mono-modal methods, which use only the visual or audio information from the investigated video to tackle the identification task, the proposed multi-modal methods jointly exploit audio  ...  To this purpose, we develop two different CNN-based camera model identification methods, working in a novel multi-modal scenario.  ...  Given a query video, we extract and pre-process its visual and audio content. Then, we feed these data to one multi-input CNN, composed of two CNNs whose last fully-connected layers are concatenated.  ... 
doi:10.3390/jimaging7080135 pmid:34460771 fatcat:7v5nxgk225akffydyiq3ojtd24

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events [article]

Chuang Gan, Xiaoyu Chen, Phillip Isola, Antonio Torralba, Joshua B. Tenenbaum
2020 arXiv   pre-print
Humans integrate multiple sensory modalities (e.g. visual and audio) to build a causal understanding of the physical world.  ...  First, we allow the agent to collect a small amount of acoustic data and use K-means to discover underlying auditory event clusters.  ...  Active exploration or random explorations? We propose an online clustering-based intrinsic module for active audio data collections.  ... 
arXiv:2007.13729v1 fatcat:yvlz2stdxnckbnacgyc7rjmrui

Wav2CLIP: Learning Robust Audio Representations From CLIP [article]

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello
2022 arXiv   pre-print
as it does not require learning a visual model in concert with an auditory model.  ...  Furthermore, Wav2CLIP needs just ~10% of the data to achieve competitive performance on downstream tasks compared with fully supervised models, and is more efficient to pre-train than competing methods  ...  data, or via audio-visual correspondence, we plot the confusion matrices of YamNet and Wav2CLIP in Figure3on TAU, an audio-visual scene classification dataset.  ... 
arXiv:2110.11499v2 fatcat:uq6dxnke6ne5nissu6yividmsu

PathoSonic: Performing Sound In Virtual Reality Feature Space

Fede Camara Halac, Shadrick Addy
2020 Proceedings of the International Conference on New Interfaces for Musical Expression  
Through implementation of a multi-sensory experience, including visual aesthetics, sound, and haptic feedback, we explore inclusive approaches to sound visualization, making it more accessible to a wider  ...  The name comes from the different paths the participant can create through their sonic explorations.  ...  Through implementation of a multi-sensory experience, including visual aesthetics, sound, and haptic feedback, this investigation seeks to explore inclusive approaches to sound visualization, making it  ... 
doi:10.5281/zenodo.4813510 fatcat:byfxpchd45airm5l3dnxgjxapa

Using Machine Learning to Classify Music Genre

Rachaell Nihalaani
2021 International Journal for Research in Applied Science and Engineering Technology  
In this literature, we aimed to build a machine learning model to classify the genre of an input audio file using 8 machine learning algorithms and determine which algorithm is the best suitable for genre  ...  First, we performed data visualization to get familiar with our data. For visualization purposes, we considered one instance of our dataset -the 11th audio file belonging to the Pop genre.  ...  Such extensive exploration and visualization are necessary due to the input files being audios. Now, we are fully equipped to begin building our model.  ... 
doi:10.22214/ijraset.2021.38365 fatcat:ub2nn2xb3fedjgalqve6ryfgdq
« Previous Showing results 1 — 15 out of 66,850 results