As the interaction over the web has increased, incidents of aggression and related events like trolling, cyberbullying, flaming, hate speech, etc. too have increased manifold across the globe. While most of these behaviours like bullying or hate speech have predated the Internet, the reach and extent of the Internet has given these an unprecedented power and influence to affect the lives of billions of people. It has been observed that the incidents of aggression and unratified verbal behaviour has not remained just a minor nuisance but has acquired the form of a major criminal activity that affects a large number of people. These have not only created mental and psychological agony to the users of the web but has, in fact, forced people to deactivate their accounts and in rare instances also commit suicides. So it is of utmost significance and importance that some preventive measures be taken to safeguard the interests of the people using the web as well as of the web itself such that it remains a viable medium of communication and connection, in general.
The aim of the project is to develop a prototype that could automatically tell ratified (both aggressive as well as non-aggressive) linguistic behaviour from unratified (aggressive) ones (recognised by varied names like flaming, aggression, trolling, hate speech, cyberbullying, etc.) on the online forums (especially social media and news opinion websites blogs). I propose to develop the system using supervised text classification methods combined with sequence models that would be trained using a dataset annotated with labels pertaining to aggression in Hindi and Hindi-English code-mixed data collected from different kinds of Facebook Pages including those of news media organisations, support help groups, celebrity pages and other similar pages as well as from certain focused topics themes on Twitter.