Latency–accuracy trade-offs in event-based gesture recognition: comparing spiking vs. polynomial kernels.
TENNs–PLEIADES: A spatio-temporal neural network that models motion using compact polynomial kernels. Each temporal kernel is represented through orthogonal Jacobi polynomials—reducing parameters while preserving temporal reasoning. The model alternates between temporal and spatial convolutions (1+2)D, maintaining causality and enabling low-latency inference.
PLEIADES achieves real-time gesture recognition on event-based data, producing frame-wise predictions every 10 ms with high efficiency and accuracy (99.6%) using only 192k parameters.
S-TLLR: A biologically inspired network that processes information through discrete spikes instead of continuous activations. It uses a three-factor local learning rule combining spike timing, eligibility traces, and an error signal—enabling online, real-time learning without full backpropagation through time.
This architecture adapts VGG-9 into a spiking form, significantly reducing memory cost while maintaining competitive accuracy (97.9%) and strong early performance despite its biologically constrained design.
Training & Evaluation:
The two methods follow contrasting training philosophies:
PLEIADES updates its loss continuously and begins producing outputs after only ~100 ms, thanks to causal kernels and zero-padding.
S-TLLR computes its loss only after the full 1.5 s gesture sequence, emphasizing sequence-level understanding.
To enable latency analysis, S-TLLR was retrained from scratch with saved checkpoints and time-resolved predictions across 20 timesteps (75 ms–1.5 s)
Conclusion:
The comparison reveals a clear trade-off between speed and biological realism. PLEIADES excels in low-latency, high-accuracy applications, while S-TLLR offers energy-efficient, online learning aligned with neuromorphic computing. Together, they illustrate two complementary paths toward real-time gesture AI.
SEE ALSO
