Airbnb Tech Blog

Airbnb’s AI-powered photo tour using Vision Transformer

thumbnail

AI-powered Photo Tour using Vision Transformer for Airbnb

Room Classification

The AI-powered photo tour in the Listings tab utilizes vision transformers to accurately categorize images into 16 different room types, helping hosts organize their listing photos efficiently.

Image Similarity

Image clustering is another key component of the photo tour, grouping images of the same room into clusters for better organization.

Accuracy Improvement

To improve prediction accuracy with limited training data, the approach involved multi-task training, ensemble learning, and distillation techniques.

Step 1 - Multi-task Training

The model was fine-tuned using high-accuracy training data for the target task and additional data labeled for related tasks like object detection.

Step 2 - Ensemble Learning

An ensemble of multiple models was created by training with different auxiliary tasks and using various versions of Vision Transformers, enhancing overall performance.

Step 3 - Distillation

Knowledge distillation was employed to transfer knowledge from a complex ensemble to a smaller model, reducing computational resources while maintaining accuracy.

By leveraging the Airbnb listing photos and incorporating multi-task training and ensemble learning, the AI-powered photo tour delivers accurate room classification and image similarity for an enhanced user experience.