StairNet: visual recognition of stairs for human–robot locomotion

Table 3 Summary of our StairNet stair recognition systems

Type	Data set size	Training approach	Architecture	Change in accuracy compared to baseline	NetScore	Model Parameters (millions)
Baseline Neural Network	515,452 labeled	SL—Single frame	MobileNetV2	0%	186.8	2.3
Temporal Neural Networks*	515,452 labeled	SL—M1	MoViNet	+ 1.1%	167.4	4.0
		SL—M1	MobileNetV2 + LSTM	+ 0.1%	132.1	6.1
		SL—M1	MobileViT-XXS + LSTM	− 0.2%	155.0	3.4
		SL—MM	MobileNetV2 + LSTM	− 26.5%	120.1	6.0
Semi-Supervised Neural Network	300,000 labeled, 900,000 unlabeled	SSL—Fix Match	MobileViT-XS	+ 0.4%	202.4	1.9
		SSL—Fix Match	MobileViT-XXS	− 0.7%	186.5	0.9
		SSL—Fix Match	MobileViT-S	− 1.2%	169.7	4.9

The models were evaluated based on image classification accuracy and efficiency (i.e., NetScore, where higher is better). The systems are organized by model type. We tested supervised learning (SL) and semi-supervised learning (SSL) methods, and many-to-one (M1) and many-to-many (MM) temporal neural networks. The data set sizes for our baseline and temporal neural networks were 515,452 labeled images, and 300,000 labeled images and 1.8 million unlabeled images for our semi-supervised learning networks
^*Evaluated using the video-based train/validation/test split as described in the “Temporal Neural Networks” section

ISSN: 1475-925X