How we can calculate sentiment for large documens? (more than 5120 characters)
We are trying to analyze a large document (more than 5120 characters). We don't know how to calculate the overall sentiment score of the entire document.
Let imagine we have a document with 5200 characters. So we can split it into two chunks( first with 5120 characters and the second is 80 characters) and the overall score calculates as an average of two documents. But this approach is not right because the second smaller document with 80 characters can dramatically affect the result.
What are your suggestions?
How we should split and calculate the overall score?
I am currently facing the same problem.
You should be able to do your calculation based on the individual sentence scores.
As far as I could understand, the calculation of the scores of a document is as follows: If the overall score of a document is "positive", "negative" or "mixed" (ie "non-neutral"), only those sentences are considered whose score is "non-neutral" and the overall average is calculated on the basis of these values.
N = number of "non-neutral" rated sentences in the document
- Overall_score_pos = Sum [pos_scores of the “non-neutral” evaluated sentences in the document] / N
- Overall_score_neutral = Sum [neutral_scores of the “non-neutral” rated sentences in the document] / N
- Overall_score_neg = Sum [neg_scores of the "non-neutral" evaluated sentences in the document] / N
If a document was rated "neutral", all sentences are included in the evaluation: Overall score [category] = sum (sentence_score [category]) / number of sentences
The labeling should then again be analogous to: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis?tabs%20=%20version-3%20&tabs=version-3#sentiment-labeling
I hope that helps!