•1 min read•from Machine Learning
[R] VLMs Behavior for Long Video Understanding
I have extensively searched on long video understanding datasets such as Video-MME, MLVU, VideoBench, LongVideoBench and etc. What I have seen there these datasets are focused on different categories such dramas, films, TV shows, documentaries where focus on tasks like ordering, counting, reasoning and etc.
I feel that multi-step reasoning is less explored and then what i have did i designed the questions with no options just ground truth and asked the VLM to give me the answer but VLMs unable to give the answer. But when i give the 4 options then VLM achieves 100% accuracy.
My point is that why VLMs behave like this?
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#rows.com
#financial modeling with spreadsheets
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#VLMs
#long video understanding
#datasets
#multi-step reasoning
#Video-MME
#MLVU
#VideoBench
#LongVideoBench
#ground truth
#dramas
#films
#TV shows
#documentaries
#accuracy
#tasks