This story was originally published on HackerNoon at:
https://hackernoon.com/mil-perspective-analyzing-q-former-as-a-multi-head-mechanism.
Proves Q-Former is a Multi-Head MIL module due to permutation invariance in its cross-attention.
Check more stories related to machine-learning at:
https://hackernoon.com/c/machine-learning.
You can also check exclusive content about
#deep-learning,
#multiple-instance-learning,
#cross-attention,
#permutation-invariance,
#mllm-architecture,
#instance-correlation,
#visual-adapters,
#multi-head-mechanism, and more.
This story was written by:
@instancing. Learn more about this writer by checking
@instancing's about page,
and for more stories, please visit
hackernoon.com.
Proves Q-Former is a Multi-Head MIL module due to permutation invariance in its cross-attention. Notes its limitation: it assumes i.i.d. instances, overlooking crucial instance correlation.