A group of academics has designed a “deep learning-based acoustic side-channel attack” that can be used to classify laptop keystrokes that are recorded using a nearby phone with 95% accuracy.
“When trained on keystrokes recorded using the video conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium,” researchers Joshua Harrison, Ehsan Toreini, and Maryam Mehrnezhad said in a new study published last week.
Side-channel attacks refer to a class of security exploits that aim to gain insight from a system by monitoring and measuring its physical effects during the processing of sensitive data. Some common observable effects include runtime behavior, power consumption, electromagnetic radiation, acoustics, and cache access.
Although completely side-channel-free implementations do not exist, such practical attacks can have harmful consequences for user privacy and security because they can be weaponized by a malicious actor to obtain passwords and other confidential data.
“The ubiquity of keyboard acoustic emissions not only makes them a readily available attack vector, but also leads victims to underestimate (and therefore not try to hide) their outputs,” the researchers said. “For example, when typing a password, people will routinely hide their screen but do nothing to obscure the sound of their keyboard.”
To carry out the attack, the researchers first conducted experiments using an Apple MacBook Pro’s 36 keys (0–9, A–Z), pressing each key 25 consecutive times, varying pressure and finger pressure. This information was recorded via a phone located near both the laptop and the zoom.
The next phase entailed isolating the individual keystrokes and converting them into a mel-spectrogram, on which a deep learning model called CoAtNet (pronounced “coat” nets and short for convolution and self-attention networks) was run to classify the keystroke images.
As countermeasures, the researchers recommend a change in typing style, using random passwords instead of full-word passwords, and adding randomly generated fake keystrokes to voice call-based attacks.