Evaluating OpenCV new RANSACs
They become much better now
OpenCV RANSAC is dead. Long live the OpenCV USAC!
Year ago we published a paper "Image Matching across Wide Baselines: From Paper to Practice", which, among other messages, has shown that OpenCV RANSAC for fundamental matrix estimation is terrible: it was super inaccurate and slow. Since then my colleague Maksym Ivashechkin has spent a summer 2020 improving OpenCV RANSACs. His work was released as a part of OpenCV 4.5.0 release.
Now it is time to benchmark them. Let's go!
Evaluation methodology
The benchmark is done on the validation subset of the Image Matching Challenge 2021 datasets. We have detected RootSIFT features, matched them with optimal mutual SNN ratio test and feed into the tested RANSACs. The resulting fundamental matrixes were transformed into relative poses and compared to the ground truth poses. You can check details in the paper "Image Matching across Wide Baselines: From Paper to Practice".
For all RANSACs we first determine the optimal inlier threshold by the grid search, whereas number of iterations (max_iter
) was set to a reasonable 100k. Then, after fixing this optimal threshold, we vary number of iterations from 10 to 10M. This gives us an accuracy-time curve.
Methods evaluated
Non-OpenCV methods:
- DEGENSAC - from
pydegensac
package, based on the original implementation of the method, proposed in CVPR 2005 paper "Two-View Geometry Estimation Unaffected by a Dominant Plane". It is the default choise for the Image Matching Challenge 2020 and 2021. - PyRANSAC - also from
pydegensac
package, with flagenable_degeneracy_check=False
, which is equivalent to a vanilla LO-RANSAC implementation.
OpenCV methods, named after the flag, one needs to pass into cv2.findFundamentalMatrix
function:
- USAC_DEFAULT – LO-RANSAC + degeneracy tests
- USAC_FAST – LO-RANSAC + degeneracy tests. Fewer iterations in local optimization step than USAC_DEFAULT. Uses RANSAC score to maximize number of inliers and terminate earlier.
- USAC_ACCURATE. Implements Graph-Cut RANSAC + degeneracy tests.
- USAC_MAGSAC – MAGSAC++ implementation + degeneracy tests.
- RANSAC -- OpenCV RANSAC implementation from the previous versions of the library, no degeneracy tests
All OpenCV USAC methods also use SPRT-test for speeding-up the evaluation.
Results
Here are results for all 3 datasets. The lefter and upper is curve, the better. Dashed vertical line marks 1/25 sec ("realtime") and 0.5 sec (challenge limit) time budget. Legend shows the method name and the optimal inlier threshold for the datasets: Phototourism, GoogleUrban and PragueParks respectively.
The first and main message -- all new flags are much better than the old OpenCV implementation (green curve, worst results), which still a default option.
10k iterations and USAC_ACCURATE (red curve) gives you great results within 0.01 sec
All OpenCV advanced USACs are better than for the small/medium time budget (< 0.1 sec per image) than pydegensac (blue curve).
The best methods for the higher budget are OpenCV USAC_MAGSAC and DEGENSAC from the pydegensac package.
There is no point is using flag "USAC_FAST" it is always better to use USAC_DEFAULT, USAC_ACCURATE or USAC_MAGSAC.
USAC_MAGSAC is the only method, which optimal threshold is the same across all datasets. This is a valuable property for practice, as it requires the least tuning.
If you are interesting in results for an individual datasets, here they are.
Phototourism
GoogleUrban
PragueParks
Why do I tune and evaluate on the same set?
It is true, that tuning and evaluation of the method on the same dataset does not make any sense. However, let me defend my choice. Here are the arguments:
I do not want to compromise an integrity of the test set, which is the basis of the on-going competition Image Matching Challenge 2021 with prize money. That is why I do not want to leak information from the abovementioned test set and this is my primarly optimization objective. I also cannot tune the threshold on the "training subset", as both GoogleUrban and PragueParks do not have such.
I am interested more in the rough speed-accuracy trade-off than the precise rankings of the methods. It is quite likely, that those methods, which have an small acuracy gap on the validation set, would switch on the test set -- as it happened with DEGENSAC and MAGSAC in our original paper. However, it is very unlikely, that method, which performs poorly on the validation set would magically outperform everyone on the test set. Again, see PyRANSAC vs DEGENSAC in the original paper.
I clearly state this fact as a limitation and do not publish a paper ;)
Conclusion
New OpenCV RANSACs are fast and have comparable accuracy, you can safely pick one of them. However, if you are using pydegensac and have > 0.1 sec time budget, there is no need to switch.
Use proper RANSACs and be happy :)