Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding.

Grounded VL tasks such as grounded captioning require the model to generate a text description and align predicted words with object regions. ... Experiments cover 7 VL benchmarks, including grounded captioning, visual grounding, image captioning, and visual question answering. ... .: A fast and accurate one-stage approach to visual grounding. In: ICCV (2019) 10, 27 78. ...

arXiv:2111.12085v2 fatcat:o2kevp3lo5dlvknxtomttua77m

Multiple Versions

The E-Z Reader model (Reichle et al. 1998; provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where ... On the basis of this discussion, we conclude that E-Z Reader provides the most comprehensive account of eye movement control during reading. ... ACKNOWLEDGMENT Denis Drieghe is a research assistant of the Fund for Scientific Research (Flanders, Belgium). ...

doi:10.1017/s0140525x03000104 fatcat:663lhrgspjdflcsnlxn2zoneqa

Szczepanski

data (like object bounding boxes). ... On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised ... Beyond instructional videos: Probing for more diverse visual-textual grounding on youtube. In EMNLP, 2020. [40] Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. ...

arXiv:2106.02636v3 fatcat:mrj2t3yuanbdzhsujshtky4enq

Multiple Versions

One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction. ... For each category, we synthesize its representative studies, strengths, and weaknesses. ... Thus, it is suspected that the visualization has little to do with the model's reasoning process. ...

arXiv:2209.11326v4 fatcat:rat7nu5fpjdjznqvqcjeknjd3q

Open Access Multiple Versions

Simply by introducing one extra hyperparameter and adding one line of code, our Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D image classification, instance segmentation, object ... Generally speaking, however, a good loss function can take on much more flexible forms, and should be tailored for different tasks and datasets. ... ACKNOWLEDGEMENTS We thank James Philbin, Doug Eck, Tsung-Yi Lin and the rest of Waymo Research and Google Brain teams for valuable feedback. ...

arXiv:2204.12511v2 fatcat:zvou7glq4fb2rdyngel3myzrqq

Open Access Multiple Versions

in generating coherent and accurate responses for questions requiring long responses (such as 'how' and 'why' questions) is less reliant on observing annotated data and mainly supported by their pre-training ... We benchmarkT5 models on GooAQ and observe that: (a) in line with recent work, LM's strong performance on GooAQ's short-answer questions heavily benefit from annotated data; however, (b) their quality ... Approved for Public Release, Distribution Unlimited. ...

arXiv:2104.08727v2 fatcat:zloaxrwk2re47afc7luqacf5my

Multiple Versions

These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. ... All of these resources can be found on our project website. ... , Complex Counting, and Visual Grounding. ...

arXiv:2401.17221v1 fatcat:g5q3g2d52vcblohtguou4u6idq

This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. ... The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can ... The two-stage object detectors get a sparse set of proposals on which they have to perform predictions. ...

arXiv:1809.03193v2 fatcat:wj2bu3ewvbdq5fjyvrbqewpxzu

Multiple Versions

The simulation of evolution might aid imagination in normative political theory and help to rethink democracy in an increasingly horizontally modular world. ... The effects on respect for human rights and on individual autonomy are hard to predict. ... grounds. ...

doi:10.1111/1467-9760.00070 fatcat:7vxkl37rjzcvjnmobhr3a4atza

We contrast reactive and endogenously active perspectives on brain activity. ... One of the many successes of the reactive perspective was the identification, in the second half of the 20 th century, of the distinctive contributions of different brain ... One example, discussed at greater length in section 2, is a pathway through the visual system that is responsible for the phenomenon of object recognition. ...

doi:10.1007/978-94-007-1951-4_16 fatcat:2m6xsw4m7bchng2suj2zlr6xym

A distinction is made among several stages in visual processing, including, in addition to the inflexible early-vision stage, a pre-perceptual attention-allocation stage and a post-perceptual evaluation ... These two stages provide the primary ways in which cognition can affect the outcome of visual perception. ... is problematic on several grounds which we explore in this commentary. ...

pmid:11301517 fatcat:opmglvkui5hyjdh2l2rdq56ela

Szczepanski

is problematic on several grounds which we explore in this commentary. ... Unsurprisingly, maintaining consistency on this treacherous ground proves equally difficult for Pylyshyn (and this commentator). ...

doi:10.1017/s0140525x99522024 fatcat:natiboxfcbcfzebe6352b2xgye

Szczepanski

We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. ... We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal. ... allowing for visual objects to be linked to unfolding linguistic information, places, times, and each other. ...

doi:10.1017/s0140525x12001495 pmid:23789620 fatcat:iysayp5tujgffkbwtjwxx23e3m

Szczepanski

By contrast, research on object recognition suggests that even young infants represent some of the defining features and physical constraints that specify the identity and continuity of objects. ... One system is concerned with the perceptual control and guidance of actions, the other with the perception and recognition of objects and events. ... Most infants crawl with their abdomens on the ground before crawling on hands-and-knees. ...

doi:10.1146/annurev.psych.47.1.431 pmid:8624139 fatcat:tzdobuuwdzcrfovu4e4wesem2i

As a result, Kennedy objects to any reliance on race in the decision to stop or search suspects. ... But it all depends on how predictive it is. ... Since we know from equation (A2) that IM equals I, and from equation (A5) that the change in IM is four times the change in I, then we know that one denominator in equation (Al) is simply one fourth ...

doi:10.2139/ssrn.471901 fatcat:gytt6gi3sbcelgc22slaetuj7m

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling [article]

Preserved Fulltext

Other Versions

The E-Z Reader model of eye-movement control in reading: Comparisons to other models

Preserved Fulltext

MERLOT: Multimodal Neural Script Knowledge Models [article]

Preserved Fulltext

Other Versions

Towards Faithful Model Explanation in NLP: A Survey [article]

Preserved Fulltext

Other Versions

PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions [article]

Preserved Fulltext

Other Versions

GooAQ: Open Question Answering with Diverse Answer Types [article]

Preserved Fulltext

Other Versions

MouSi: Poly-Visual-Expert Vision-Language Models [article]

Preserved Fulltext

Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks [article]

Preserved Fulltext

Rationality, Democracy and Leaky Boundaries: Vertical vs. Horizontal Modularity

Preserved Fulltext

From Reactive to Endogenously Active Dynamical Conceptions of the Brain [chapter]

Preserved Fulltext

Is vision continuous with cognition? The case for cognitive impenetrability of visual perception

Preserved Fulltext

Color memory penetrates early vision

Preserved Fulltext

An integrated theory of language production and comprehension

Preserved Fulltext

ORIGINS AND EARLY DEVELOPMENT OF PERCEPTION, ACTION, AND REPRESENTATION

Preserved Fulltext

Rethinking Racial Profiling: A Critique of the Economics, Civil Liberties, and Constitutional Literature, and of Criminal Profiling More Generally

Preserved Fulltext