Visually Exploring Multi-Purpose Audio Data.

We analyse multi-purpose audio using tools to visualise similarities within the data that may be observed via unsupervised methods. ... We use the visual assessment of cluster tendency (VAT) technique on a well known data set to observe how the samples naturally cluster, and we make comparisons to the labels used for audio geotagging and ... Fig. 2 . 2 A series of ordered dissimilarity matrices produced by VAT and SpecVAT on multi-purpose audio data. ...

arXiv:2110.04584v1 fatcat:aa6apxylabdirmifi4luxttd4i

Open Access

Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. ... Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. ... Hence, audio-visual learning is essential to our pursuit of human-like machine perception ability. Its purpose is to explore computational approaches that learn from both audio-visual data. ...

arXiv:2208.09579v1 fatcat:xrjedf2ezbhbzbkysw2z2jsm7e

This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips ... Both are examples which are correctly classified by our multi-modality approach. ... We explore visual and audio in this experiment due to accessibility, since there is a lot of audio-visual video data available to researchers. ...

doi:10.1109/iros45743.2020.9341557 fatcat:yusylnuo2nfthag7ipnezfjgcu

The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. ... This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips ... We explore visual and audio in this experiment due to accessibility, since there is a lot of audio-visual video data available to researchers. ...

arXiv:2007.10175v1 fatcat:wsufgbkxbfhujb2kra7zmb3yky

It explores how information can besonified and visualized to facilitate findings, and eventually becomeinteractive musical compositions. Cloud Bridge functions as a multi-user,multimodal instrument. ... Cloud Bridge leads to a new media interactiveinterface utilizing audio synthesis, visualization and real-time interaction. ... This project is a proof of concept for an interactive multi-user software instrument that utilizes data as the driver for visual/audio content. ...

doi:10.5281/zenodo.1178596 fatcat:zxkacyryujawxohsn5chmsg3sy

Open Access

The purpose of this study is to explore multi-sensory elements through songs in learning and the impact on pupils with learning disabilities (PLD). ... Data of the study were analyzed by using constant comparison techniques of multi-sensory elements, the use of songs and the effects on pupils using the Nvivo software. ... DATA FREQUENCY Total Observations (Recorded) Interviews (Triangulatio n) P1 P2 P3 P1 P2 P3 Perceptions Audio Songs (EA) S1 / / / 3 S2 / / / 3 Visual Track (EV) S1 / / / ...

doi:10.6007/ijarped/v8-i4/6909 fatcat:ntzq56jtyjdcfh7lcaxg4cetky

Hidden Markov models (HMMs) are used for audio-only modeling, and multi-stream HMMs or coupled HMMs (CHMM) are used for audio-visual joint modeling. ... To allow the flexibility of audio-visual state asynchrony, we explore effective CHMM training via HMM state-space mapping, parameter tying and different initialization schemes. ... Different fusion methods have been explored for the audio and visual modalities. ...

doi:10.1109/icassp.2011.5946412 dblp:conf/icassp/HuangZH11 fatcat:tvzxgu5ls5dmjgc2bej3tvqdlq

The installation utilizes a combination of infrared motion tracking, custom computer vision, multi-channel (10.1) spatialized interactive audio, 3D graphics, data sonification, audio design, networking ... Here we describe the physical and audio display systems for the installation and a hybrid strategy for multi-channel spatialized interactive audio rendering in immersive virtual reality that combines amplitude ... The installation explores the potential interplay between artistic and data-driven strategies, based on visual and auditory pattern, in working with massive multidimensional multi-scale, multi-resolution ...

doi:10.1117/12.806928 fatcat:ssz77u5g2ngo5fx5sq54f4s76i

The deliberation model outperforms the multi-stream model and achieves a relative WER improvement of 6% and 8.7% for the clean and masked data, respectively, compared to an audio-only model. ... Firstly, we propose a multi-stream attention architecture to leverage signals from both audio and video modalities. ... For this purpose, we augment the data by masking out specific words in the audio stream. ...

arXiv:2011.04084v1 fatcat:4ibujf7lc5cerf3v2h74rpcfum

Open Access

This mechanism is pivotal in enabling end-to-end joint training with video data at different modalities, including visual-only, audio-only, and audio-visual formats. ... This dataset allows Audio-Visual LLM to adeptly process a variety of task-oriented video instructions, ranging from multi-turn conversations and audio-visual narratives to complex reasoning tasks. ... AudioGPT [33] leverages various audio foundation models to process audio data, where LLMs are regarded as the general-purpose interface. ...

arXiv:2312.06720v2 fatcat:vjwjvxyrbvg57eddyjwk7ytvka

Multiple Versions

Differently from mono-modal methods, which use only the visual or audio information from the investigated video to tackle the identification task, the proposed multi-modal methods jointly exploit audio ... To this purpose, we develop two different CNN-based camera model identification methods, working in a novel multi-modal scenario. ... Given a query video, we extract and pre-process its visual and audio content. Then, we feed these data to one multi-input CNN, composed of two CNNs whose last fully-connected layers are concatenated. ...

doi:10.3390/jimaging7080135 pmid:34460771 fatcat:7v5nxgk225akffydyiq3ojtd24

DOAJ

Humans integrate multiple sensory modalities (e.g. visual and audio) to build a causal understanding of the physical world. ... First, we allow the agent to collect a small amount of acoustic data and use K-means to discover underlying auditory event clusters. ... Active exploration or random explorations? We propose an online clustering-based intrinsic module for active audio data collections. ...

arXiv:2007.13729v1 fatcat:yvlz2stdxnckbnacgyc7rjmrui

as it does not require learning a visual model in concert with an auditory model. ... Furthermore, Wav2CLIP needs just ~10% of the data to achieve competitive performance on downstream tasks compared with fully supervised models, and is more efficient to pre-train than competing methods ... data, or via audio-visual correspondence, we plot the confusion matrices of YamNet and Wav2CLIP in Figure3on TAU, an audio-visual scene classification dataset. ...

arXiv:2110.11499v2 fatcat:uq6dxnke6ne5nissu6yividmsu

Multiple Versions

Through implementation of a multi-sensory experience, including visual aesthetics, sound, and haptic feedback, we explore inclusive approaches to sound visualization, making it more accessible to a wider ... The name comes from the different paths the participant can create through their sonic explorations. ... Through implementation of a multi-sensory experience, including visual aesthetics, sound, and haptic feedback, this investigation seeks to explore inclusive approaches to sound visualization, making it ...

doi:10.5281/zenodo.4813510 fatcat:byfxpchd45airm5l3dnxgjxapa

Open Access

In this literature, we aimed to build a machine learning model to classify the genre of an input audio file using 8 machine learning algorithms and determine which algorithm is the best suitable for genre ... First, we performed data visualization to get familiar with our data. For visualization purposes, we considered one instance of our dataset -the 11th audio file belonging to the Pop genre. ... Such extensive exploration and visualization are necessary due to the input files being audios. Now, we are fully equipped to begin building our model. ...

doi:10.22214/ijraset.2021.38365 fatcat:ub2nn2xb3fedjgalqve6ryfgdq

Open Access

Visually Exploring Multi-Purpose Audio Data [article]

Preserved Fulltext

Learning in Audio-visual Context: A Review, Analysis, and New Perspective [article]

Preserved Fulltext

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

Preserved Fulltext

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines [article]

Preserved Fulltext

Cloud Bridge: A Data-Driven Immersive Audio-Visual Software Interface

Preserved Fulltext

Mediation Exploring Multi-Sensory Elements Through the Use of Songs and its Effects to Pupils with Learning Disabilities

Preserved Fulltext

Improving acoustic event detection using generalizable visual features and multi-modality modeling

Preserved Fulltext

Sensate abstraction: hybrid strategies for multi-dimensional data in expressive virtual reality contexts

Preserved Fulltext

Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations [article]

Preserved Fulltext

Audio-Visual LLM for Video Understanding [article]

Preserved Fulltext

Other Versions

CNN-Based Multi-Modal Camera Model Identification on Video Sequences

Preserved Fulltext

Noisy Agents: Self-supervised Exploration by Predicting Auditory Events [article]

Preserved Fulltext

Wav2CLIP: Learning Robust Audio Representations From CLIP [article]

Preserved Fulltext

Other Versions

PathoSonic: Performing Sound In Virtual Reality Feature Space

Preserved Fulltext

Using Machine Learning to Classify Music Genre

Preserved Fulltext