Voice-user interfaces (VUIs) have advanced quickly from a limited role as a hands-free alternative for smartphones to a broader role in controlling multiple devices through a smart speaker. Driven by various factors, including lessons from the COVID-19 pandemic, VUIs are poised to proliferate rapidly, playing an expanded role in smart products in application areas ranging from smart homes to healthcare and many more. In the following, we’ll review these applications.
VUIs in Smart Speakers
Smartphone virtual assistants such as Apple’s Siri brought VUIs into the mainstream but provided a scope of services limited to the smartphone itself. The introduction of smart speakers such as Amazon’s Echo provided users with a broader reach within a smart home. Using voice interactions with Echo’s built-in Alexa application, users can control other connected smart products designed with Alexa compatibility. The use of smart speaker VUIs has led to increased use of voice assistants on smartphones for a large percentage of smart-speaker owners, according to the recurring Smart Audio research study by National Public Radio and Edison Research.
With about a quarter of the U.S. population already owning smart speakers, VUIs’ routine uses continue to accelerate. The COVID-19 pandemic had further boosted VUI usage, according to a Smart Audio study conducted at the end of March 2020, when stay-at-home orders were widely in effect. Although some of that boost might simply result from increased usage by stay-at-home workers, it nevertheless highlights growing acceptance of hands-free voice control when the population has gained increased awareness of contagion and virus transmission through surface contact. As a result, the ability to operate a device or piece of equipment without touching it becomes increasingly attractive for convenience and health factors. These benefits are particularly important in healthcare facilities struggling to stem cross infections. Here, a combination of voice control and biometric technologies can allow authorized healthcare providers to use suitably enabled medical equipment without compromising their patients’ safety or themselves.
Until recently, however, implementation of VUI subsystems was largely limited to enterprises such as Amazon, Apple, Google, and Microsoft to create the required specialized hardware solutions and with the market reach to make the investment worthwhile. Developers looking to enable voice control for their products needed to augment their designs with suitable compatibility features that allowed their products to respond to user voice commands delivered indirectly to their product through a smart-speaker hub such as Echo. However, for users, this hub-based approach to voice control would often lead to connection problems as they tried to place newly purchased smart products farther from the hub. Today, off-the-shelf devices enable developers to implement the sophisticated processing pipelines required to implement native VUI capabilities in their designs more easily.
VUI processing pipelines typically comprise several stages following a set of one or two microphones for near-field applications or a microphone array for far-field applications. The pipeline’s front end is designed to optimize audio signal acquisition with beamforming techniques to enhance signals in far-field applications. Following that, the signal conditioning stage enhances the speaker’s voice signal using echo cancellation and other noise-suppression methods combined with codecs, digital-signal processors (DSPs), or specialized voice processors.
The next stage monitors the voice data, looking for the wake word as data streams through the pipeline. When the wake word is detected, the digitized stream starting with the wake word is passed by the final communications stage to the cloud. In turn, cloud services identify commands using natural language processing (NLP) algorithms, prepare a response based on the application’s business rules, and return a voice response down through the signal chain to the smart speaker.
More advanced devices continue to emerge, collapsing the functionality of separate stages of the pipeline into a single part. Besides reducing footprint and design complexity, these highly integrated devices also help reduce overall system power requirements. Along with their inherent low power operation, these devices often provide integrated features that enable developers to keep pipelines in low-power sleep states, returning to normal active mode when needed.
For example, some MEMS microphones can operate in a low-power sleep state, waking only when they detect audio activity and signaling the next stage to wake up and process the incoming data. Similarly, voice-activity detection algorithms running in early stages of the pipeline can signal the wake-word detection stage to return to active mode and begin executing its algorithms. On finding the wake word, it can wake the communications stage to begin streaming data to the cloud. Using this approach, developers can progressively wake the pipeline a stage at a time as needed or allow it to settle back to a battery-saving sleep state.
On the cloud side, turnkey NLP services from the large commercial cloud-service providers enable developers to add VUI capabilities to their products. For example, Amazon’s Alexa Voice Service lets developers create designs with a standalone Alexa-compatible VUI able to operate independently from the Echo smart speaker. At the same time, alternative approaches to current centralized cloud solutions are emerging to allay users’ concerns about privacy and manufacturers’ concerns about cost.
A growing number of private/hybrid cloud solutions offers an attractive approach for cost containment in large-scale applications manufacturers. At the same time, these solutions can provide a means to tighten security of user data associated with voice-command processing.
However, for many users, any solution that causes their voice data to leave their local control can remain a cause for concern. An emerging alternative completely removes the need to send voice data to the cloud. New machine-learning methods are beginning to deliver smaller, faster inference models needed for an effective VUI without compromising needed accuracy rates or vocabulary size. Using this type of local speech processing, next-generation VUIs can offer faster response times and privacy on the user side while also helping manufacturers keep a lid on cloud costs beyond those needed for normal cloud tasks such as onboarding, over-the-air updates, and interaction with other cloud services.
Voice-enabled smart products might have been only a matter of convenience in the past. Today, their ability to provide hands-off control can help maintain health at home and in public places, including healthcare facilities. Fortunately, a rapidly growing array of enabling technologies and products provides the foundation for building sophisticated voice-control capabilities into more smart products.
Source: Mouser Electronics