Hard object detection using temporal features
Using data over time to improve model performance
There is a big focus on single shot performance in computer vision, which can have a large cost in
terms of model complexity, dataset collection and annotation time as well as training costs.
It might even be a case of diminishing returns where significant effort is spent for little boost in
performance.
To get around this I'll talk a little to what my experience has been with applying traditional CV
techniques with deep learning to get the results
needed.
Where single shot falls apart
For the purpose of this article I'll be coming at the approach of detecting objects in complex
environments i.e. small objects, occlusions, blurry features.
Sometimes a complicated environment is a good use case for larger detection models which can learn
to extract the information from the scene required for acceptable detection, but as
mentioned this can come with a few issues:
- A larger model means we need more data to train it
- More data collection and labeling time
- Longer training times
- Longer inference times (especially if you're using a realtime system / object tracking this can be a big problem)