Neural Symbolic Representation Learning for Image Captioning.

We promote and analyze the needs of a common publicly available benchmark dataset to be used for neural-symbolic studies of learning and reasoning. ... Along with the original tasks that were suggested by the Visual Genome creators, we propose neural-symbolic tasks that can be used as challenges to promote research in the field and competition between ... We would like to thank the reviewers for detailed and very beneficial comments on the paper. ...

dblp:conf/nesy/YilmazGS16 fatcat:qf3grff5nbbjdbszeihh5ugghy

To develop image/visual-captioning-based CAPTCHAs, this paper proposes a new image captioning architecture by exploiting tensor product representations (TPR), a structured neural-symbolic framework developed ... To address this, this paper promotes image/visual captioning based CAPTCHAs, which is robust against machine-learning-based attacks. ... And as a by-product, the symbolic character of TPRs makes them amenable to conceptual interpretation in a way that standard learned neural network representations are not. ...

arXiv:1710.11475v2 fatcat:yiknq6uarbe23nmgwbzbq2qzgy

Multiple Versions

In this paper, we propose to tackle this challenge by employing neural factor graphs to induce a tighter coupling between concepts in different modalities (e.g. images and text). ... Our model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions. ... Several neural architectures have shown great promise in learning multimodal representations to solve the task [42, 39, 54] . ...

dblp:conf/nips/SaqurN20 fatcat:xqqettxn2reixbcnvg7ldthlfm

Thus, language models perform better when they learn like a baby, i.e, in a multi-modal environment. ... We examine the benefits of visual context in training neural language models to perform next-word prediction. ... While image-captioning generally focuses on ranking appropriate caption candidates, we intend to use the model to generate sen-tences using only the image for guidance. ...

arXiv:1805.11546v2 fatcat:m7cbjby4pfcr5af6gll2wglizu

Multiple Versions

This paper examines to what degree current deep learning architectures for image caption generation capture spatial language. ... Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric relations between objects. ... The research of Dobnik was supported by a grant from the Swedish Research Council (VR project 2014-39) for the establishment of the Centre for Linguistic Theory and Studies in Probability (CLASP) at Department ...

arXiv:1807.08133v1 fatcat:ea34qml7jzbz5peenpwc4bowri

Thus, the neuro-symbolic learning (NeSyL) notion emerged, which incorporates aspects of symbolic representation and bringing common sense into neural networks (NeSyL). ... Attempts have been made to overcome the challenges in neural network computing by representing and embedding domain knowledge in terms of symbolic representations. ... Fig. 8 : 8 Fig. 8: The combination of neural learning and symbolic learning. The neural network extracts features which passes for a symbolic unit for reasoning and inferencing. ...

arXiv:2208.00374v1 fatcat:pktmnomj3bbwpjyj7lmu37rl7i

Open Access

Citation

Muhammad Hassan, Haifei Guan, Aikaterini Melliou, Yuqi Wang, Qianhui Sun, Sen Zeng, Wen Liang, Yiwei Zhang, Ziheng Zhang, Qiuyue Hu, Yang Liu, Shunkai Shi, Lin An, Shuyue Ma, Ijaz Gul, Muhammad Akmal Rahee, Zhou You, Canyang Zhang, Vijay Kumar Pandey, Yuxing Han, Yongbing Zhang, Ming Xu, Qiming Huang, Jiefu Tan, Qi Xing, Peiwu Qin, Dongmei Yu. "Neuro-Symbolic Learning: Principles and Applications in Ophthalmology." arXiv (2022)

Motivated by this, we propose a neural-symbolic approach for a one-shot retrieval of images from a large scale catalog, given the caption description. ... With the prolification of multimodal interaction in various domains, recently there has been much interest in text based image retrieval in the computer vision community. ... In this work, we propose a neural symbolic approach for modeling a caption based image retrieval task. ...

arXiv:1911.00850v1 fatcat:j5lktzr65ffbna5x77uu5bnksy

The proposed model first symbolically disassembles the text-modality information to a set of fact queries based on the Abstract Meaning Representation of the caption and then forwards the query-image pairs ... contents (e.g. mismatched images and captions) to deceive the public and fake news detection systems. ... Neural-Symbolic Multi-Modal Learning: Existing Neural-Symbolic Multi-Modal Learning methods are usually designed for Vision Question Answering (Yi et al., 2018; Zhu et al., 2022) . ...

arXiv:2304.07633v2 fatcat:26xeo2q5knfxphfe27qfqi2kmq

Multiple Versions

The images are visually represented using a Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) Show-and-Tell model is adopted for image caption ... The aim of this presented work is the generation of image keywords, which can be substituted as text representation for classifications tasks and image retrieval purposes. ... Using image -caption pairs of 164,614 biomedical figures, distributed for training at the ImageCLEF Caption Prediction Task, long short-term memory based Recurrent Neural Network models were trained. ...

dblp:conf/clef/PelkaF17 fatcat:227flkcnwbca5cygipztjvpjn4

These neural symbolic representation helps in better definition of the neural symbolic space for neuro-symbolic attention and can be transformed to better captions. ... In this work, we have introduced a novel technique for caption generation using the neural-symbolic encoding of the scene-graphs, derived from regional visual information of the images and we call it Tensor ... For image captioning application, TP sgt R helps in providing several discrete interaction information through the required graphical layer based representation interface and these can be used as neuro-symbolic ...

arXiv:1911.10115v1 fatcat:jbbw4g2msjcsnf4jfukzmiqtkm

Open Access

Instead of generating textual sequences directly from images, we first learn a smooth, continuous representation space for the captions. ... A methodology is described for the generation of relevant captions for images of an extensive medical dataset in the ImageCLEF 2018 Caption Prediction competition. ... Generating captions from images is also a task that requires an understanding of data representations in neural networks. ...

dblp:conf/clef/SpinksM18 fatcat:wiponkzgbzd6vkc4kfh3zu5dmm

This was experimentally proven by Hubel &Wiesel in the 1960s and became the basis for machine learning approaches such as the Neocognitron and the even later Deep Learning. ... There is still a significant gap between machine-level pattern recognition and human-level concept learning. ... ACKNOWLEDGEMENTS This work has received funding by the Austrian Science Fund (FWF), Project: P-32554 "A reference model for explainable Artificial Intelligence in the medical domain". ...

arXiv:2103.00519v1 fatcat:d57pwgzf4vhmpaa5hqwm7ls5zq

Open Access

In this paper, we propose the neural-symbolic learning (NSL) framework that performs human-like learning by unifying deep neural learning and symbolic logical reasoning for the spinal medical report generation ... Generally speaking, the NSL framework firstly employs deep neural learning to imitate human visual perception for detecting abnormalities of target spinal structures. ... Combining neural learning and symbolic reasoning for the medical report generation is proper and novel. ...

arXiv:2004.13577v1 fatcat:oh5aka5zr5be3ipd7qqnyhikzy

With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning and visual question answering, this research field brings ... some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities. ... -Multimodal applications: image captioning, video description, AVSR, • Core technical challenges -Representation learning, translation, alignment, fusion and co-learning 2. ...

doi:10.18653/v1/p17-5002 dblp:conf/acl/MorencyB17 fatcat:m24h75t6mvdyfeedrsjbvjjaom

To apply brain activity to the image-captioning network, we train regression models that learn the relationship between brain activity and deep-layer image features. ... To effectively use a small amount of available brain activity data, our proposed method employs a pre-trained image-captioning network model using a deep learning framework. ... Interestingly, a proper caption was generated using brain activity data with the three-layer neural network model compared to the image-captioning model for the second training example. ...

arXiv:1802.02210v1 fatcat:qfbevp2mfjcevo64vaq22kz3nm

A Proposal for Common Dataset in Neural-Symbolic Reasoning Studies

Preserved Fulltext

A Neural-Symbolic Approach to Design of CAPTCHA [article]

Preserved Fulltext

Other Versions

Multimodal Graph Networks for Compositional Generalization in Visual Question Answering

Preserved Fulltext

Like a Baby: Visually Situated Neural Language Acquisition [article]

Preserved Fulltext

What is not where: the challenge of integrating spatial representations into deep learning architectures [article]

Preserved Fulltext

Neuro-Symbolic Learning: Principles and Applications in Ophthalmology [article]

Preserved Fulltext

Scene Graph based Image Retrieval – A case study on the CLEVR Dataset [article]

Preserved Fulltext

Interpretable Detection of Out-of-Context Misinformation with Neural-Symbolic-Enhanced Large Multimodal Model [article]

Preserved Fulltext

Keyword Generation for Biomedical Image Retrieval with Recurrent Neural Networks

Preserved Fulltext

TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning [article]

Preserved Fulltext

Generating Text from Images in a Smooth Representation Space

Preserved Fulltext

KANDINSKYPatterns – An experimental exploration environment for Pattern Analysis and Machine Intelligence [article]

Preserved Fulltext

Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation [article]

Preserved Fulltext

Multimodal Machine Learning: Integrating Language, Vision and Speech

Preserved Fulltext

Describing Semantic Representations of Brain Activity Evoked by Visual Stimuli [article]

Preserved Fulltext