Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

565 Hits in 5.0 sec

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling [article]

Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang
2022 arXiv   pre-print
Grounded VL tasks such as grounded captioning require the model to generate a text description and align predicted words with object regions.  ...  Experiments cover 7 VL benchmarks, including grounded captioning, visual grounding, image captioning, and visual question answering.  ...  .: A fast and accurate one-stage approach to visual grounding. In: ICCV (2019) 10, 27 78.  ... 
arXiv:2111.12085v2 fatcat:o2kevp3lo5dlvknxtomttua77m

The E-Z Reader model of eye-movement control in reading: Comparisons to other models

Erik D. Reichle, Keith Rayner, Alexander Pollatsek
2003 Behavioral and Brain Sciences  
The E-Z Reader model (Reichle et al. 1998; provides a theoretical framework for understanding how word identification, visual processing, attention, and oculomotor control jointly determine when and where  ...  On the basis of this discussion, we conclude that E-Z Reader provides the most comprehensive account of eye movement control during reading.  ...  ACKNOWLEDGMENT Denis Drieghe is a research assistant of the Fund for Scientific Research (Flanders, Belgium).  ... 
doi:10.1017/s0140525x03000104 fatcat:663lhrgspjdflcsnlxn2zoneqa

MERLOT: Multimodal Neural Script Knowledge Models [article]

Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
2021 arXiv   pre-print
data (like object bounding boxes).  ...  On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised  ...  Beyond instructional videos: Probing for more diverse visual-textual grounding on youtube. In EMNLP, 2020. [40] Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi.  ... 
arXiv:2106.02636v3 fatcat:mrj2t3yuanbdzhsujshtky4enq

Towards Faithful Model Explanation in NLP: A Survey [article]

Qing Lyu, Marianna Apidianaki, Chris Callison-Burch
2024 arXiv   pre-print
One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction.  ...  For each category, we synthesize its representative studies, strengths, and weaknesses.  ...  Thus, it is suspected that the visualization has little to do with the model's reasoning process.  ... 
arXiv:2209.11326v4 fatcat:rat7nu5fpjdjznqvqcjeknjd3q

PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions [article]

Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Xiaojie Shi, Shuyang Cheng, Dragomir Anguelov
2022 arXiv   pre-print
Simply by introducing one extra hyperparameter and adding one line of code, our Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D image classification, instance segmentation, object  ...  Generally speaking, however, a good loss function can take on much more flexible forms, and should be tailored for different tasks and datasets.  ...  ACKNOWLEDGEMENTS We thank James Philbin, Doug Eck, Tsung-Yi Lin and the rest of Waymo Research and Google Brain teams for valuable feedback.  ... 
arXiv:2204.12511v2 fatcat:zvou7glq4fb2rdyngel3myzrqq

GooAQ: Open Question Answering with Diverse Answer Types [article]

Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, Chris Callison-Burch
2021 arXiv   pre-print
in generating coherent and accurate responses for questions requiring long responses (such as 'how' and 'why' questions) is less reliant on observing annotated data and mainly supported by their pre-training  ...  We benchmarkT5 models on GooAQ and observe that: (a) in line with recent work, LM's strong performance on GooAQ's short-answer questions heavily benefit from annotated data; however, (b) their quality  ...  Approved for Public Release, Distribution Unlimited.  ... 
arXiv:2104.08727v2 fatcat:zloaxrwk2re47afc7luqacf5my

MouSi: Poly-Visual-Expert Vision-Language Models [article]

Xiaoran Fan, Tao Ji, Changhao Jiang, Shuo Li, Senjie Jin, Sirui Song, Junke Wang, Boyang Hong, Lu Chen, Guodong Zheng, Ming Zhang, Caishuang Huang (+12 others)
2024 arXiv   pre-print
These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information.  ...  All of these resources can be found on our project website.  ...  , Complex Counting, and Visual Grounding.  ... 
arXiv:2401.17221v1 fatcat:g5q3g2d52vcblohtguou4u6idq

Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks [article]

Shivang Agarwal, Jean Ogier Du Terrail, Frédéric Jurie
2019 arXiv   pre-print
This article reviews the recent literature on object detection with deep CNN, in a comprehensive way, and provides an in-depth view of these recent advances.  ...  The survey covers not only the typical architectures (SSD, YOLO, Faster-RCNN) but also discusses the challenges currently met by the community and goes on to show how the problem of object detection can  ...  The two-stage object detectors get a sparse set of proposals on which they have to perform predictions.  ... 
arXiv:1809.03193v2 fatcat:wj2bu3ewvbdq5fjyvrbqewpxzu

Rationality, Democracy and Leaky Boundaries: Vertical vs. Horizontal Modularity

S. L. Hurley
1999 The Journal of Political Philosophy  
The simulation of evolution might aid imagination in normative political theory and help to rethink democracy in an increasingly horizontally modular world.  ...  The effects on respect for human rights and on individual autonomy are hard to predict.  ...  grounds.  ... 
doi:10.1111/1467-9760.00070 fatcat:7vxkl37rjzcvjnmobhr3a4atza

From Reactive to Endogenously Active Dynamical Conceptions of the Brain [chapter]

Adele Abrahamsen, William Bechtel
2011 Boston Studies in the Philosophy of Science  
We contrast reactive and endogenously active perspectives on brain activity.  ...  One of the many successes of the reactive perspective was the identification, in the second half of the 20 th century, of the distinctive contributions of different brain  ...  One example, discussed at greater length in section 2, is a pathway through the visual system that is responsible for the phenomenon of object recognition.  ... 
doi:10.1007/978-94-007-1951-4_16 fatcat:2m6xsw4m7bchng2suj2zlr6xym

Is vision continuous with cognition? The case for cognitive impenetrability of visual perception

Z Pylyshyn
1999 Behavioral and Brain Sciences  
A distinction is made among several stages in visual processing, including, in addition to the inflexible early-vision stage, a pre-perceptual attention-allocation stage and a post-perceptual evaluation  ...  These two stages provide the primary ways in which cognition can affect the outcome of visual perception.  ...  is problematic on several grounds which we explore in this commentary.  ... 
pmid:11301517 fatcat:opmglvkui5hyjdh2l2rdq56ela

Color memory penetrates early vision

James A. Schirillo
1999 Behavioral and Brain Sciences  
is problematic on several grounds which we explore in this commentary.  ...  Unsurprisingly, maintaining consistency on this treacherous ground proves equally difficult for Pylyshyn (and this commentator).  ... 
doi:10.1017/s0140525x99522024 fatcat:natiboxfcbcfzebe6352b2xgye

An integrated theory of language production and comprehension

Martin J. Pickering, Simon Garrod
2013 Behavioral and Brain Sciences  
We then consider the evidence for interweaving in action, action perception, and joint action, and explain such evidence in terms of prediction.  ...  We show how these accounts explain a range of behavioral and neuroscientific data on language processing and discuss some of the implications of our proposal.  ...  allowing for visual objects to be linked to unfolding linguistic information, places, times, and each other.  ... 
doi:10.1017/s0140525x12001495 pmid:23789620 fatcat:iysayp5tujgffkbwtjwxx23e3m


Bennett I. Bertenthal
1996 Annual Review of Psychology  
By contrast, research on object recognition suggests that even young infants represent some of the defining features and physical constraints that specify the identity and continuity of objects.  ...  One system is concerned with the perceptual control and guidance of actions, the other with the perception and recognition of objects and events.  ...  Most infants crawl with their abdomens on the ground before crawling on hands-and-knees.  ... 
doi:10.1146/annurev.psych.47.1.431 pmid:8624139 fatcat:tzdobuuwdzcrfovu4e4wesem2i

Rethinking Racial Profiling: A Critique of the Economics, Civil Liberties, and Constitutional Literature, and of Criminal Profiling More Generally

Bernard E. Harcourt
2003 Social Science Research Network  
As a result, Kennedy objects to any reliance on race in the decision to stop or search suspects.  ...  But it all depends on how predictive it is.  ...  Since we know from equation (A2) that IM equals I, and from equation (A5) that the change in IM is four times the change in I, then we know that one denominator in equation (Al) is simply one fourth  ... 
doi:10.2139/ssrn.471901 fatcat:gytt6gi3sbcelgc22slaetuj7m
« Previous Showing results 1 — 15 out of 565 results