Andreas Kopetz, Infineon Technologies AG
In recent years a revolution has been taking place in the world of user interaction with smart devices. The success of last year’s Amazon Echo, followed by the release of Google Home this year, show that we are moving into the age of full voice control of our devices. Advances in cloud computing have allowed for the implementation of sophisticated learning algorithms for voice recognition, placing us on the brink of natural voice based interaction, a world where simple conversational commands are all that is required, without the need of a screen to augment or assist in our experience.
As we move into the age of voice as the preferred user interface, Yole Développement has predicted an almost threefold increase in the shipments of MEMS microphones from 2013 (2.4 billion units) to 2019 (6.6 billion units). However, with this increase in demand for MEMS microphones comes a refined set of requirements to meet the needs of far field voice detection applications, including increasingly popular beam forming applications.
Signal to Noise Ratio
One of the most important specifications when dealing with MEMS microphonesis the Signal to Noise Ratio (SNR). The importance of a microphone’s SNR in far field and voice detection applications is often misunderstood to reflect the ability of the microphone to operate in a noisy environment. However, the SNR of a microphone only measures the signal against the internal noise sources, and so is actually a better indicator of how a microphone will perform in a quiet room, such as a living room or bedroom.
SNR is a good indicator ofthe minimum sound level that a microphone can detect in a quiet environment and is measured in dB,against a reference signal of 94dBSPL (a value chosen as it equates to a 1Pa change in pressure, not because it has any significance for audio use cases). Typical high performance microphones currently have an SNR in the range of 64dB to 68dB.
A microphone with an SNR of 64dB will not be able to differentiate sounds below 30dBSPL (94dBSPL – 64dB) from its internal noise. In reality even the best speech detection algorithm will require the information component of the signal to be elevated above the noise floor by a few dBs, the exact requirement depends on the algorithm and processing applied to the signal. If we assume that a speech detection algorithm requires 10dB between the microphone noise floor and the input signal level, then a microphone with an SNR of 64dB will be able to detect sounds down to 40dBPSL. This is equivalent to normal conversation levels (coffee shop conversation volume) at a distance of around 5 to 6 meters, or a quiet conversation (office conversation volume) at around 2 to 3 meters. For the user, this means seamlessly interacting with a device anywhere in the room, without having to face towards it or raise their voice. For an ideal microphone, a 6dB increase in SNR allows detection of sound twice as far away, or at half the
volume level.
Acoustic Overload Point
In loud environments, the most important parameter is the microphone’s Acoustic Overload Point (AOP). Put simply, this defines the maximum signal level which the microphone can detect without “too much” distortion, where “too much” is defined as 10% Total Harmonic Distortion (THD). Recent advances in MEMS technology have pushed the AOP of microphones from 120dBSPL up to and above 130dBSPL. To put that in perspective, a smartphone in 2012 would have trouble recording a live concert without clipping, whereas modern high end microphones are capable of capturing high quality audio from the front row of a rock festival.
Signals at high levels can come from unexpected sources. Wind noise is a particularly big problem for microphones when used outdoors. Depending on wind speed and direction,and the orientation of the microphone, this noise can exceed 120dBSPL. Wind noise consists of a strong low frequency fundamental tone with higher frequencies at lower levels, and it is usually filtered out before speech processing algorithms are applied or when HD voice calls are made. However it can only be filtered out if the input signal has not reached the AOP level. At the point where there is significant harmonic distortion in the signal, filtering is no longer possible.
Having a higher AOP level means that when a user is outside in windy or noisy conditions, the microphone can detect and record a signal which may feature a lot of background noises, but which is still not clipped. This means that speech detection algorithms can still be effectively applied, filtering can be used on HD voice calls to improve audibility, or songs recorded at concerts can still be enjoyed. The key is that even when there is some background noise, it is a lot easier to post process and recover a signal which is not overloaded with harmonic distortion and clipping.
Dynamic Range
While SNR is a good indicator of microphone performance in a quiet environment, and AOP is a good indicator of performance in a loud environment, the Dynamic Range of a microphone is a combination of both parameters, indicating the range of sound pressure levels to which the microphone is sensitive. While a microphone can be designed to be very sensitive to quiet sounds, or very robust to high sound pressure levels, it far more difficult to design one product which can work well for both situations.
Modern MEMS microphones have Dynamic Range measurements approaching 100dB, allowing them the same microphone which can record thundering kettledrums and pipe organs at an orchestra recital to also pick up the whispering of the audience members.
The dynamic range of a microphone is not always explicitly specified in the datasheet, but can be easily determined form other specifications. For digital microphones it is simply attained by adding the sensitivity parameter to the SNR. For an analog microphone, the difference between the AOP and 94dBSPL reference level must be found first, and then this number is added to the SNR.
Infineon’s Dual Back-Plate Technology
Typically a MEMS microphone consists of a flexible charged membrane and a rigid voltage sensing back-plate. The membrane and back-plate form a capacitor, the value of which changes as the membrane is moved by vibrations in the air. This changing capacitance is converted into a changing voltage which is either amplified and output directly by an analog microphone ASIC, or converted into a digital output signal by an ADC in a digital microphone ASIC. This operation is similar to studio condenser microphones, but on a much smaller scale.
Infineon’s dual back-plate technology is a proprietary method of MEMS construction which uses two back-plates, one on each side of the membrane. This method allows fully differential measurement of the signal, improving signal response and quality, increasing SNR and AOP levels and also improving the THD performance of the microphone below the AOP level.
The dual back-plate structure of Infineon microphone MEMS has an added benefit in robustness to fast air pressure changes, such as those experienced when a phone is dropped on the ground. Competitors’ MEMS have to include some compromises to improve robustness.
While the AOP of a microphone is defined as the input sound pressure level at which there is 10% THD on the microphone output, studies suggest that much lower levels of THD are audible to listeners and can detract from the listener experience. Infineon dual back-plate microphones have an excellent THD profile, staying below 1% until very close to the AOP level, giving an excellent listening experience up to and even above 130dBSPL.
Thanks to the rise of voice interaction as a preferred user interface to devices, the future is bright and exciting for MEMS microphones,and they will be pushed further than ever before in the search for perfect performance and usability. Infineon are industry pioneers, promoting dual back-plate, and fully differential microphones in both analog and digital flavors. As a pioneering technology company, Infineon is always looking to the future, developing new and innovative concepts and designs to push the performance of products. In the drive for flawless audio fidelity on a micro scale we will continue to improve THD and AOP performance, expand frequency response, and push SNR to 70dB and beyond.