A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling
[article]
2022
arXiv
pre-print
Grounded VL tasks such as grounded captioning require the model to generate a text description and align predicted words with object regions. ...
Experiments cover 7 VL benchmarks, including grounded captioning, visual grounding, image captioning, and visual question answering. ...
.: A fast and accurate one-stage approach to visual grounding. In: ICCV (2019) 10, 27 78. ...
arXiv:2111.12085v2
fatcat:o2kevp3lo5dlvknxtomttua77m
The E-Z Reader model of eye-movement control in reading: Comparisons to other models
2003
Behavioral and Brain Sciences
The E-Z Reader model (Reichle et al. 1998; provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where ...
On the basis of this discussion, we conclude that E-Z Reader provides the most comprehensive account of eye movement control during reading. ...
ACKNOWLEDGMENT Denis Drieghe is a research assistant of the Fund for Scientific Research (Flanders, Belgium). ...
doi:10.1017/s0140525x03000104
fatcat:663lhrgspjdflcsnlxn2zoneqa
MERLOT: Multimodal Neural Script Knowledge Models
[article]
2021
arXiv
pre-print
data (like object bounding boxes). ...
On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised ...
Beyond instructional videos: Probing
for more diverse visual-textual grounding on youtube. In EMNLP, 2020.
[40] Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. ...
arXiv:2106.02636v3
fatcat:mrj2t3yuanbdzhsujshtky4enq
Towards Faithful Model Explanation in NLP: A Survey
[article]
2024
arXiv
pre-print
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction. ...
For each category, we synthesize its representative studies, strengths, and weaknesses. ...
Thus, it is suspected that the visualization has little to do with the model's reasoning process. ...
arXiv:2209.11326v4
fatcat:rat7nu5fpjdjznqvqcjeknjd3q
PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions
[article]
2022
arXiv
pre-print
Simply by introducing one extra hyperparameter and adding one line of code, our Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D image classification, instance segmentation, object ...
Generally speaking, however, a good loss function can take on much more flexible forms, and should be tailored for different tasks and datasets. ...
ACKNOWLEDGEMENTS We thank James Philbin, Doug Eck, Tsung-Yi Lin and the rest of Waymo Research and Google Brain teams for valuable feedback. ...
arXiv:2204.12511v2
fatcat:zvou7glq4fb2rdyngel3myzrqq
GooAQ: Open Question Answering with Diverse Answer Types
[article]
2021
arXiv
pre-print
in generating coherent and accurate responses for questions requiring long responses (such as 'how' and 'why' questions) is less reliant on observing annotated data and mainly supported by their pre-training ...
We benchmarkT5 models on GooAQ and observe that: (a) in line with recent work, LM's strong performance on GooAQ's short-answer questions heavily benefit from annotated data; however, (b) their quality ...
Approved for Public Release, Distribution Unlimited. ...
arXiv:2104.08727v2
fatcat:zloaxrwk2re47afc7luqacf5my
MouSi: Poly-Visual-Expert Vision-Language Models
[article]
2024
arXiv
pre-print
These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. ...
All of these resources can be found on our project website. ...
, Complex Counting, and Visual Grounding. ...
arXiv:2401.17221v1
fatcat:g5q3g2d52vcblohtguou4u6idq
Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks
[article]
2019
arXiv
pre-print
This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances. ...
The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can ...
The two-stage object detectors get a sparse set of proposals on which they have to perform predictions. ...
arXiv:1809.03193v2
fatcat:wj2bu3ewvbdq5fjyvrbqewpxzu
Rationality, Democracy and Leaky Boundaries: Vertical vs. Horizontal Modularity
1999
The Journal of Political Philosophy
The simulation of evolution might aid imagination in normative political theory and help to rethink democracy in an increasingly horizontally modular world. ...
The effects on respect for human rights and on individual autonomy are hard to predict. ...
grounds. ...
doi:10.1111/1467-9760.00070
fatcat:7vxkl37rjzcvjnmobhr3a4atza
From Reactive to Endogenously Active Dynamical Conceptions of the Brain
[chapter]
2011
Boston Studies in the Philosophy of Science
We contrast reactive and endogenously active perspectives on brain activity. ...
One of the many successes of the reactive perspective was the identification, in the second half of the 20 th century, of the distinctive contributions of different brain ...
One example, discussed at greater length in section 2, is a pathway through the visual system that is responsible for the phenomenon of object recognition. ...
doi:10.1007/978-94-007-1951-4_16
fatcat:2m6xsw4m7bchng2suj2zlr6xym
Is vision continuous with cognition? The case for cognitive impenetrability of visual perception
1999
Behavioral and Brain Sciences
A distinction is made among several stages in visual processing, including, in addition to the inflexible early-vision stage, a pre-perceptual attention-allocation stage and a post-perceptual evaluation ...
These two stages provide the primary ways in which cognition can affect the outcome of visual perception. ...
is problematic on several grounds which we explore in this commentary. ...
pmid:11301517
fatcat:opmglvkui5hyjdh2l2rdq56ela
Color memory penetrates early vision
1999
Behavioral and Brain Sciences
is problematic on several grounds which we explore in this commentary. ...
Unsurprisingly, maintaining consistency on this treacherous ground proves equally difficult for Pylyshyn (and this commentator). ...
doi:10.1017/s0140525x99522024
fatcat:natiboxfcbcfzebe6352b2xgye
An integrated theory of language production and comprehension
2013
Behavioral and Brain Sciences
We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction. ...
We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal. ...
allowing for visual objects to be linked to unfolding linguistic information, places, times, and each other. ...
doi:10.1017/s0140525x12001495
pmid:23789620
fatcat:iysayp5tujgffkbwtjwxx23e3m
ORIGINS AND EARLY DEVELOPMENT OF PERCEPTION, ACTION, AND REPRESENTATION
1996
Annual Review of Psychology
By contrast, research on object recognition suggests that even young infants represent some of the defining features and physical constraints that specify the identity and continuity of objects. ...
One system is concerned with the perceptual control and guidance of actions, the other with the perception and recognition of objects and events. ...
Most infants crawl with their abdomens on the ground before crawling on hands-and-knees. ...
doi:10.1146/annurev.psych.47.1.431
pmid:8624139
fatcat:tzdobuuwdzcrfovu4e4wesem2i
Rethinking Racial Profiling: A Critique of the Economics, Civil Liberties, and Constitutional Literature, and of Criminal Profiling More Generally
2003
Social Science Research Network
As a result, Kennedy objects to any reliance on race in the decision to stop or search suspects. ...
But it all depends on how predictive it is. ...
Since we know from equation (A2) that IM equals I, and from equation (A5) that the change in IM is four times the change in I, then we know that one denominator in equation (Al) is simply one fourth ...
doi:10.2139/ssrn.471901
fatcat:gytt6gi3sbcelgc22slaetuj7m
« Previous
Showing results 1 — 15 out of 565 results