Random Musings (and associated non sequiturs) v. 3.0

The “call an elderly person with news their grandchild needs to be bailed out of jail” scam isn’t new, but The Register has a new twist on it: get ahold of a long enough recording of someone’s voice and software will play back in that voice anything you type.

The scam has depended in the past on older people not having clear memories, and the calls often come in the middle of the night when anyone might not be thinking clearly. Now we add in synthesis of a voice they might actually recognize as legitimate. You’ve got the “you have to take care of this now” urgency that scams usually employ coupled with “that really does sound like my grandson/granddaughter”.

The Register pointed out security and legal issues this technology raises over a year ago. The article said people might have to start speaking differently to voice assistants like Alexa and Siri so that if the recordings of what they’ve said are accessed and fed into the software, there would be a difference in the output as compared to everyday conversations.

I got to listen to a presentation recently on the history of microprocessors and one of the points was that problems like Meltdown and Spectre involve technologies and systems that are so widespread that fixing them may be impossible. Google, Amazon, Apple and Comcast (for their TV remotes) are all pushing their voice assistants. “See how easy it is when you can just say things and search or execute things by voice?” We’re at the Star Trek level of interaction with computers.

Now comes the fact that we shortly may not be able to trust that an audio recording is actually from a person. We’ve already got that with video and pictures. Adobe’s initial reaction was “we’ll watermark it for security purposes”, but that earlier article on the Register pointed out that will raise the question of what about all the other recordings that don’t have a watermark?

I’ll point out this: Will phone systems and VOIP systems have to add in watermark-detection processes so they can alert you that the person you’re speaking with is actually a human and not a simulation of their voice? What about all the existing systems that don’t have that function? How many years did the U.S. have to prepare for the transition from analog to digital TV? That wasn’t a change in response to a security issue. Would it even be possible to completely replace all components of the voice infrastructure?

But that’s not the only voice system that is affected. Cockpit voice recorders become evidence when an accident occurs. Will those also have to be watermarked?

In reading about this issue and while typing this post, I’ve already figured out three other ways voice manipulation could be exploited. I’m not going to say what they are because I don’t want to give anyone ideas about how to commit crimes. And that’s coming from someone who only learned about this a half hour ago.

 
It’s been said in a few different ways, but I’ll use the quote from Jurassic Park. “[Y]our scientists were so preoccupied with whether or not they could that they didn’t stop to think if they should.”

1 Like