Logo
User: Guest  Login
Authors:
Rösch, Philipp J.; Libovický, Jindřich 
Document type:
Konferenzbeitrag / Conference Paper 
Title:
Probing the Role of Positional Information in Vision-Language Models 
Title of conference publication:
Findings of the Association for Computational Linguistics: NAACL 2022 
Conference title:
Conference of the North American Chapter of the Association for Computational Linguistics (2022, Seattle, WA) 
Venue:
Seattle, WA, United States 
Year of conference:
2022 
Date of conference beginning:
10.07.2022 
Date of conference ending:
15.07.2022 
Publisher:
Association for Computational Linguistics (ACL) 
Year:
2022 
Pages from - to:
1031-1041 
Language:
Englisch 
Abstract:
In most Vision-Language models (VL), the understanding of the image structure is enabled by injecting the position information (PI) about objects in the image. In our case study of LXMERT, a state-of-the-art VL model, we probe the use of the PI in the representation and study its effect on Visual Question Answering. We show that the model is not capable of leveraging the PI for the image-text matching task on a challenge set where only position differs. Yet, our experiments with probing confirm...    »
 
Department:
Fakultät für Elektrotechnik und Technische Informatik 
Institute:
ETTI 2 - Institut für Verteilte Intelligente Systeme 
Chair:
Oswald, Norbert 
Open Access yes or no?:
Ja / Yes 
Type of OA license:
CC BY 4.0 
Miscellaneous:
https://www.unibw.de/vis-en/naacl2022