IoT Device Labeling Using Large Language Models

Anat Bremler-Barr, Bar Meyuhas, Tal Shapira
arxiv,
2024
Projects, thesis, and dissertations
Internet of Things (IoT)

Abstract

The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security
and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques
that assume the IoT device, labeled by both its vendor and function, was observed during their training phase. We tackle a key challenge in IoT labeling: how can an AI solution
label an IoT device that has never been seen before and whose label is unknown?

Our solution extracts textual features such as domain names and hostnames from network traffic, and then enriches these features using Google search data alongside catalog of vendors
and device functions. The solution also integrates an auto-update mechanism that uses Large Language Models (LLMs) to update these catalogs with emerging device types.
Based on the information gathered, the device’s vendor is identified through string matching with the enriched features.
The function is then deduced by LLMs and zero-shot classification from a predefined catalog of IoT functions. In an evaluation of our solution on 97 unique IoT devices,
our function labeling approach achieved HIT1 and HIT2 scores of 0.7 and 0.77, respectively. As far as we know, this is the first research to tackle AI-automated IoT labeling.