Skip to main content

Table 3 Summary of our StairNet stair recognition systems

From: StairNet: visual recognition of stairs for human–robot locomotion

Type

Data set size

Training approach

Architecture

Change in accuracy compared to baseline

NetScore

Model Parameters (millions)

Baseline Neural Network

515,452 labeled

SL—Single frame

MobileNetV2

0%

186.8

2.3

Temporal Neural Networks*

515,452 labeled

SL—M1

MoViNet

 + 1.1%

167.4

4.0

SL—M1

MobileNetV2 + LSTM

 + 0.1%

132.1

6.1

SL—M1

MobileViT-XXS + LSTM

− 0.2%

155.0

3.4

SL—MM

MobileNetV2 + LSTM

− 26.5%

120.1

6.0

Semi-Supervised Neural Network

300,000 labeled, 900,000 unlabeled

SSL—Fix Match

MobileViT-XS

 + 0.4%

202.4

1.9

SSL—Fix Match

MobileViT-XXS

− 0.7%

186.5

0.9

SSL—Fix Match

MobileViT-S

− 1.2%

169.7

4.9

  1. The models were evaluated based on image classification accuracy and efficiency (i.e., NetScore, where higher is better). The systems are organized by model type. We tested supervised learning (SL) and semi-supervised learning (SSL) methods, and many-to-one (M1) and many-to-many (MM) temporal neural networks. The data set sizes for our baseline and temporal neural networks were 515,452 labeled images, and 300,000 labeled images and 1.8 million unlabeled images for our semi-supervised learning networks
  2. *Evaluated using the video-based train/validation/test split as described in the “Temporal Neural Networks” section