Masking Sensitive Information
Dealing with Numbers
After transcribing audio into text, we have a Python script were we are trying to deal with privacy. Therefore, we are masking all the numbers (e.g. 0,1,2,3,4,5,6... and one, two, three etc.).
Unfortunately, this is not fulfilling, since we want to remove on the one hand all the personal info. However, on the other hand keeping product information (e.g. serial number, product number etc.).
Therefore my question: are there any best practices for this problem (since I cannot imagine that we are the first one dealing with this :-)).
Many thanks in advance!
Example:
Original: Hi, my phone number is 12345 and I am calling for product X, with serial number 9876.
Our case: Hi, my phone number is !!!!! and I am calling for product X, with serial number !!!!.
We would like to have the serial number.. and it can also happen that it is not appearing directly after each other. E.g.:
>Hi, welcome how can i help you?
>Hi, i would like to return product X
>Ok. Do you have a serial number for me?
>Is that the same as a product number?
>Yes
>Ok. It is 9876
