top of page

UniSDNet - A novel unified architecture of static & dynamic networks for moment retrieval from videos





History of research (relevant to this paper) in Video Moment Retrieval so far -


Problem with approaches so far - they focus on static feature correlations but ignore higher-level time-series nature of video


What if we mimic human biology and do both static (like in global neuronal workspace) and then dynamic analysis to contextualize it ?


New UniSDNet architecture for mimicking human brain in video analysis -


How UniSDNet works -


UniSDNet achieves SOTA performance on 3 widely used datasets for NLVG, as well as 3 datasets for SLVG, reporting new records at 38.88% R@1 IoU@0.7 on ActivityNet captions & 40.26% R@1 IoU@0.5 on TACos. Also the inference speed is 1.56X faster than strong multi-query benchmark.


Please find the original paper here & code repo here.

Commentaires


bottom of page