Microsoft Azure Translator API call in R

2021-03-01

A colleague was having trouble constructing an API call in R to call the Microsoft Azure Translator. They had lots of household survey responses in Portuguese that they wanted to translate to English for analysis. There are some examples on the Microsoft Azure documentation of how to call the API using C#, Go, Java, Node.js and Python, but nothing for R. There are some R packages for using Azure Translator that already exist:

but as Azure Translator uses a conventional RESTful API, it’s also possible just to use the {httr} package .

Using Azure Translator requires setting up an Azure account in order to access an API key. More documentation on that here . As of 2021-02-30 there is a free tier for Azure Translator which offers up to 2 million characters of translation for free per month, with a few other features.

In R, first, load packages and create some text to translate, two sentences, in English and Portuguese:

# Packages
library(httr)

# Create some example Portuguese (used Google Translate)
engl <- "This is some test text. The big train had black smoke."
port <- "Este é um texto de teste. O grande trem tinha fumaça preta."
engl2 <- "Cardboard boxes are easy to flatten"
port2 <- "Caixas de papelão são fáceis de achatar"

Then, define keys, endpoints and parameters for the API call. The key, endpoint and location can be retrieved from your Azure portal.

key <- "XXX"
endp <- "https://api.cognitive.microsofttranslator.com"
location <- "global"
path <- "translate"
apiv <- "3.0"
to_lang <- "en"

Create the headers:

heads <- c(
  "Ocp-Apim-Subscription-Key" = key,
  "Ocp-Apim-Subscription-Region" = location,
  "Content-type" = "application/json"
  )

This is the bit that took me a bit of trial and error to figure out, using nested lists to create a JSON-like query that can then be converted to JSON for the API query. The Azure documentation states that API queries should follow this structure:

[
	{
		"Text" : "Hello, what is your name?"
	},
	{
		"Text" : "My name is John"
	}
]

So in R, thats a list, containing two other named lists (named “Text”), each containing a single character string, the string to translate. In R:

input <- list(port, port2)
input_list <- lapply(input, function(x) {
  list("Text" = x)
  })

Construct the query using httr::POST():

result <- POST(
  endp, 
  path = path,
  query = list(
    `api-version` = apiv,
    to = to_lang
    ),
  body = input_list, 
  encode = "json", 
  add_headers(.headers = heads)
)

The result is returned as a JSON string, so R needs to parse it to return a similarly nested list structure:

[
	{
		"detectedLanguage" : {
			"language" : "pt",
			"score" : 1.0
		},
		"translations" : [
			{
				"text" : "This is a test text. The big train had black smoke.",
				"to" : "en"
			}
		]
	},
	{
		"detectedLanguage" : { 
			"language" : "pt",
			"score" : 1.0
		},
		"translations" : [
			{
				"text" : "Cardboard boxes are easy to flatten",
				"to" : "en"
			}
		]
	}
]
result_parse <- content(result, as = "parsed")
[1]
[1]
[1]
1

[1]
1


[1]
[1]]$translations[[1]
[1]]$translations[[1]
1

[1]]$translations[[1]
1




[2]
[2]
[2]
1

[2]
1


[2]
[2]]$translations[[1]
[2]]$translations[[1]
1

[2]]$translations[[1]
1

Then it’s trivial to convert it to whatever data structure you want, in my case I want a dataframe:

result_df <- do.call(rbind, lapply(result_parse, function(x) {
  data.frame(
    from_lang_det = x$detectedLanguage$language,
    from_lang_score = x$detectedLanguage$score,
    to_lang = x$translations[[1]]$to,
    trans = x$translations[[1]]$text
    )
}))
from_lang_det from_lang_score to_lang trans
pt 1 en This is a test text …
pt 1 en Cardboard boxes are …