We study representations for real-world sensing data such as video, LiDAR, IMU, network logs, and IoT signals.
The goal is to capture temporal and spatial structure for action recognition, trajectory prediction, forecasting, anomaly detection, and infrastructure monitoring.
Why It Is Difficult
Real-world sensing data contains noise, missing values, occlusion, domain drift, and irregular sampling.
Spatiotemporal representations are high-dimensional, and many applications require efficient inference or online adaptation.
Approach
We use transformers, graph neural networks, self-supervised learning, multimodal fusion, and efficient inference methods.
For IoT and infrastructure problems, we also examine how physical constraints and graph sparsity can be integrated into learning.
Evaluation
We combine benchmark evaluation with real-environment validation.
Depending on the problem, we separately evaluate predictive accuracy, robustness to missing data, latency, computational complexity, and operational interpretability.
Current Questions
- 1Action summarization and anomaly-cue extraction from long videos
- 2Robust multimodal sensing through sensor fusion
- 3Adaptive spatiotemporal recognition with online updates