Anonymization, word vectors and O(n) model

Automated anonymization of documents is one popular research subject and finds extremely important usage in areas like medical data where privacy is taken very seriously. It is in general a difficult task, and even more so for data outside of medical domain due to the lack of specialized tools. In this post, I explored the possibility of using the vector embedding [Mikolov 13] approach to anonymize general forms of documents. I arrived at some interesting results with hints from statistical physics, which I would like share about.