top of page

UniSDNet - A novel unified architecture of static & dynamic networks for moment retrieval from videos

History of research (relevant to this paper) in Video Moment Retrieval so far -

Problem with approaches so far - they focus on static feature correlations but ignore higher-level time-series nature of video

What if we mimic human biology and do both static (like in global neuronal workspace) and then dynamic analysis to contextualize it ?

New UniSDNet architecture for mimicking human brain in video analysis -

How UniSDNet works -

UniSDNet achieves SOTA performance on 3 widely used datasets for NLVG, as well as 3 datasets for SLVG, reporting new records at 38.88% R@1 IoU@0.7 on ActivityNet captions & 40.26% R@1 IoU@0.5 on TACos. Also the inference speed is 1.56X faster than strong multi-query benchmark.

Please find the original paper here & code repo here.


bottom of page