Improvements to the way Siri recognizes names of small businesses and local points of interest are due to the use of language models designed for specific locations, Apple's Machine Learning Journal reveals, helping the virtual assistant understand local names for nearby places.
Virtual assistants like Siri are easily capable of understanding the name of prominent businesses and chains, such as supermarkets and restaurant franchises, writes the Siri Speech Recognition Team, with queries concerning lesser-known or regional businesses tending to provide less accurate results. In automatic speech recognition systems (ASR), the team notes this is a "known performance bottleneck" for accuracy, with those further along the long tail of a frequency distribution less likely to be correctly identified.
Apple attempted to improve this for Siri by taking into account the user's location in queries. There would also be two different types of model being used, with a general language model (LM) working alongside a geolocation-based language model (Geo-LM), with the latter becoming more useful if the user is in its coverage area.
ASR systems typically comprise of two components, consisting of an acoustic model that analyzes the properties of speech alongside the language model that analyzes the word usage. Apple noted that the system didn't adequately represent words and names for local points of interest and how they were pronounced, with the more obscure names and combinations also appearing at a very low frequency in the LM training data.
The low frequency means that, in a general LM, the local business name is less likely to be picked up compared to another location, word, or phrase.
In Apple's solution, it defined a number of geographic regions covering most of the United States, producing a Geo-LM for each area. These local versions are used depending on the user's location, though if the user is outside all defined regions or Location Services are disabled, the general LM is used instead.
There are 169 Geo-LM areas for the U.S, based on combined statistical areas defined by the U.S. Census Bureau, covering approximately 80 percent of the population. Each area consists of "adjacent metropolitan areas that are economically and socially linked," measured by commuting patterns.
In Apple's testing, there was no real change in accuracy for general queries, but there was a relative error reduction of 18.7 percent for point of interest-based searches between general LM and Geo-LM usage. In point of interest tests in eight U.S. metropolitan regions, the relative error reduction between general LM and Geo-LM increased, with the localized version performing better by between 41.9 percent and 48.4 percent.
Apple suggests that, because of the limited impact on system speed, the regional coverage of Geo-LM still has room for improvement, but a general language model will be here to stay. "It is essential to continue providing a global Geo-LM in addition to regional LMs," writes Apple, "so that ASR can handle long-distance queries and cases with users located outside supported regions."
International expansion of the program could also occur to languages other than U.S. English, with Apple noting "The method and system proposed here are language independent."
Apple still has some way to go to catch up to Google's level of accuracy for virtual assistants. A July group test revealed Siri has improved its accuracy considerably over the last year to 78.5 percent, as well as increasing its comprehension of queries to close to 100 percent, but under the same test, the Google Assistant achieved an accuracy of 85.5 percent.