Text Analytics
Welcome to the Text Analytics Forum
Categories
API – Any ideas or feedback pertaining to features or enhancements to Text Analytics API.
Documentation – Any ideas or suggestions for the API Reference or Documentation.
Language Support – Submit a request to have a particular language supported.
Samples & SDK Request – Let us know if you would like to see a Code sample or SDK provided.
Attention!
We have moved our Customer Feedback & Ideas for Azure Cognitive Services portal to the Azure Feedback Forum.
-
text analytics identifies incorrect language ( identifies as french with 1 for confidence level)
RD-1029.8.16.1-V (2020-09)
2 of 7
2.2 Qualified expenditures
Break down the R&D expenditures by entering them in columns A, B and C. In column A, enter the R&D expenditures made in the taxation year by the taxpayer.
In column B, enter the portion of the consideration attributable to R&D work that the taxpayer paid in the taxation year to a subcontractor not dealing at
arm’s length with the taxpayer. In column C, enter the portion of the consideration attributable to R&D work that the taxpayer paid in the taxation year to a
subcontractor dealing at arm’s length with the taxpayer. …1 vote -
Armenian incorrectly detected as English
Armenian (not currently supported for detection) is being labeled as English with a confidence of 1.
Examples:
For input text: 'Տուն'
The detected language is 'en'. Confidence is: 1.For input text: 'Անձնական Տվյալների Ցուցակ'
The detected language is 'en'. Confidence is: 1.For input text: 'Ժողովրդագրություն'
The detected language is 'en'. Confidence is: 1.For input text: 'Օժանդակ'
The detected language is 'en'. Confidence is: 1.For input text: 'Կապի անհատներ'
The detected language is 'en'. Confidence is: 1.For input text: 'Տվյալների Հաստատում'
The detected language is 'en'. Confidence is: 1.Many many more examples. Meanwhile, other popular…
1 vote -
Language detection for random text returns "English" with a score of 1.0 instead of "unknown"
According to the documentation at https://westus.dev.cognitive.microsoft.com/docs/services/TextAnalytics.V2.0/operations/56f30ceeeda5650db055a3c7
submitting random text like ":) :( :D" should return a result of "(Unknown)" with a score of 0, instead it returns "English" with a score of 1.This is pretty bad.
2 votes -
incorrect language detection results
String "Congrats !! Congrats !!" is detected as Tagalog instead of English.
Example:
POST https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/languages HTTP/1.1
Host: westus.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: ••••••••••••••••••••••••••••••••{
"documents": [{
"id": "1",
"text": "Congrats !! Congrats !!"
}]
}
Transfer-Encoding: chunked
x-ms-transaction-count: 1
x-aml-ta-request-id: ceea3fb2-ad0a-4b14-ae8e-54c584ab86e5
X-Content-Type-Options: nosniff
apim-request-id: c60db805-a194-4cd5-89af-c68d6ef87ed8
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Date: Mon, 11 Jun 2018 20:53:11 GMT
Content-Type: application/json; charset=utf-8{
"documents": [{"id": "1",
"detectedLanguages": [{
"name": "Filipino",
"iso6391Name": "tl",
"score": 1.0
}]}],
"errors": []
}2 votes -
Lang detection score needs guidance and probably enhancement
The lang detection API provides a score. The only details are that 1.0 indicates the highest level of confidence, and that a lower score will be returned in mixed language situations. Here are a couple of cases which don't seem to agree:
The example in the docs, leaving the english and spanish, but removing the french, is still mixed:
Hello, I would like to take a class at your University. ¿Se ofrecen clases en español? Es mi primera lengua y más fácil para escribir.
--> returns Spanish, score 1
Hey, that's life. C'est la vie!
--> returns French, score 1
…
1 vote
- Don't see your idea?