Crowd-Based Profiling : A Framework To Detect Psychological Disorders In Social Media Users1A.Sharmila Agnal, 2Dheeraj R, 3Akshay Kannan V, 4Durga S, 5Nishanth Kumar.SDepartment of CSE, SRM Institute of Science and Technologyemail id: [email protected], [email protected],[email protected], [email protected],[email protected] Abstract- Psychological disorders are presently striking a large number of population from various civilization, society, occupation and different locations around the world. The main obstacle of psychological disorders is the difficulty to detect on people suffering from these disorders, hence resulting in introducing a worrying amount of undetectable cases and false detection issue.
Our methodology aims at constructing detective models to identify psychological disorders among online social media users. These detective models are attainable by engaging a basic data collection process formulated as crowd based profiling, which assists us to collect accurate and more efficient data set of people from different categories. Our experiment proposes that obtaining specific English language patterns and socializing attributes from data sets paves the way to deal with advanced experiments on psychological disorders.
Keywords- Psychological disorders detection, Crowd based profiling, Data sets, Sentiment analysis, Online social media. I. INTRODUCTIONPeople who suffer from psychological disorders seem to have minimal contact with the people who are personally known to them. This makes them express their thoughts, feelings through online social media. Twitter is commonly used by everyone in the world as it allows them to commune their ideas and views to the public. People suffering from psychological disorders find Twitter as the perfect platform for them as it has various community groups  where they can discuss their problem and the difficulties they are going through and from which they believe they could get help from. By sharing information regarding the problems they face each day, they provide enormous content subliminally, and with the behaviour, their stability could also be measured. By using this information as input we could construct a model to detect Psychological disorders. The collecting of untapped data is referred to as crowd based profiling which is a practical data collection method used to gather data and to develop an efficient set of linguistic and behavioural patterns . This type of detective models might help to construct an advance mechanism to reduce the numbers in self-slaughter, web addiction and other major depressions which are to be existing in people affected by psychological disorders. There occurs a challenging factor in applying online media to extract patterns regarding mental stress as it is impossible for a machine to understand sarcasm, emoticons, abbreviations, etc. Thus experts accounts which are retrieved at the time of account gathering is used to conversate with professionals for pieces of advice on online social media crowdsourcing. It is important to devise a convenient data collection model to extract specific language patterns from user data so that it works accurately in a methodical manner to analyze unique language patterns. Utilizing some of the related and previous work, we systemize a group of features as attributes to construct the detective models we proposed.II. RELATED WORKSocial Network Mental Disorder Detection (SNMDD) model introduced data mining techniques to three types of SNMDDs ,. Cyber Relationship (CR) obsession, which comprises the obsession with social media surfing to converse and share private information to the point where online relationships became more important than friends and family circles. Web addiction which comprises obsessive online social gaming and gambling which affects ones career . Information Overload(IO)  includes obsessive scanning of user status, tweets, posts which leads to lower work productivity and minimal in-person interaction. There are two main challenges which are said to exist in the design of SMNDD. A mingled manual methodology and keyword matching data collection technique are implemented to effectively collect data from patients and regular users which is termed as Mental Illness Detection and Analysis via Social Media (MIDAS) For the collection of patient’s data, community portals have been created manually which are related to mental disorders. Using these portals, followers list, the self-volunteering users are also being selected. Finally, after getting the nal list of patients, their tweets are retrieved. The preprocessing work considers only the English language keywords from the tweet ignoring other language terms, abbreviations, etc. Thus, users who have very less number of posts or tweets are also ignored. MIDAS  is concentrated on two important types of features which are semantic and behavioural. Text Frequency (TF)  is used to capture the frequent and illustrative words used by the patients. The pattern of Life Features (PLF)  let slip the emotional patterns and behavioural traits of the user, by measuring polarity , scores regarding emotions, interaction via social media. To utilize multi-source learning in SNMDD, one basic method is to directly interconnect the features of each person’s data which is collected from dierent social networks as a large vector. This technique frequently misses the mutual relationship of a feature in dierent online social networks and introduce intrusion. Thus a tensor techniques have been used in great numbers to model multiple data sources because a tensor can naturally constitute multi-source data. The latest technology SNMDD based Tensor model (STM)  is presented, which allows incorporating the characteristics of SNMDs. Furnished with a new tensor model, semi-supervised learning has been constructed to categorize each user by utilizing Transductive Support Vending Machine (TSVM) . Screening tests are conducted for people of a certain category who has a greater chance of getting affected by psychological disorders . Subjects are adjusted into identical age and gender proportion for a less biased analysis . Few methods exploit both manually labelled data and noisy labelled data for training. In these methods, a novel model called Emoticon Smoothed Language Model (ESLAM)  has been used, to continuously club these two kinds of data. ESLAM method is compared to the completely supervised Language Model (LM) to check whether the smoothing with emoticons is impactful or not. Under all the evaluations, the ESLAM performs profitably in every case more than the completely supervised LM . This indicates the truth that the noisy emoticon data do have some impactful and more accurate information and ESLAM can efficiently utilize it to achieve greater performance . Detailed emotions provide evidence that further explains a user’s behaviour online . The system is only used for studying and analysing emoticons used in social media but does not have extended applications. Members in a society own qualities that make them extraordinarily effectual in spreading ideas to others. These exceptional individuals drive trends in support of the majority of ordinary people . They are merely described as being informed, respected, and well-connected ,. With the help of these works, we propose to develop a simple and basic methodology to detect two particular psychological disorder by collecting crowd-based data on one hand and acquiring the attributes of patients on other and comparing them to produce needful results.III. PROPOSED METHODOLOGYThis work aims to build a framework for detecting psychological disorders in social media users. We pursue to accomplish our complete methodology through the following: Collection of DataCleaning and preprocessing of DataExtracting FeaturesBuilding Detection ModelsPsychological Profiling To acquire random-sampled users, a set of user IDs from Twitter was initially collected. This was done by using a Twitter Streaming package on R and by randomly sampling random IDs. Then to collect tweets we download each set of selected ID using the TwitteR’ package on R. And for the collection of patient’s data and experts data, we utilize a five-step approach that merges manual effort and keywords matching technique, to make the psychological profiling of data.1)Initially, we manually collect data through a package in R, using one of the community portals where abundant data for mental disorder is available. A community portal is a common portal where a large number of potential patients and people are available to collect as a resource . This propagates easy data collection. Sometimes there are dedicated groups where related people from clinics, support groups or even doctors are available. For example, there is a portal called @HealingFromBPD  that is a viable candidate for community portal. This is because the account shares information on psychological and therapeutic information regarding psychological illnesses. It has a following of over thousands of users. To use the community portals efficiently we can search Twitter manually using associated disorder as a keyword. There are no additional limitations for selecting one of these accounts. But as a cautionary method, a number of spam accounts with similar profiling were weeded out for quality data collection. These accounts were manually reviewed to confirm if there were entities that qualified enough to be believed as a trustable community portal. Once enough data is collected through these portals, we use the TwitteR’ package to acquire the follower’s list of community portals. The collected accounts then become the main crowd from where we select both patient and expert into their respective categories. The interest group in these collected data is taken as self-volunteering users, who are categorized by the information in their bio description. We consider self-volunteering also as a form of data collection. Once these accounts are identified and collected, we label them manually into three categoriesPatient, a known patient who is affected from any form of psychological disorder,Expert, a professional in the field of psychology, including psychiatrist, analyst, and primary care providers (PCP);Non-related, a user who is neither of the above Finally, the tweets and posts of the accounts from the final list are obtained by the TwitteR’ package in R language.Preprocessing After filtering the information, we apply Sentiment Analysis and Emotion Classification to acquire both the sentiment contrast and emotion depicted by each of the user’s posts. To acquire the sentiment information of tweets, we use the R package called CRAN, which is available to download. The sentiment tool arranges the content of tweets into three contrast categories positive, negative and neutral. Obtaining the FeaturesTerm Frequency (TF)  is the number of times a particular word is referenced. To get this data using the R program, TF-IDF (Term Frequency ” inverse document frequency)  Feature is used. This feature captures the recurrent and typical words used by the patients. TF-IDF is applied to the data collected from all the patient tweets . The term recurrence is the recurrence of word sequences found in a collection of tweets posted by each Twitter user.Quanteda’ feature is considered to have the psychological terms recurrently used by patients. Quanteda’ is a simplified version of TF-IDF package, where only the words related to psychological behavior are considered (e.g., stress, feeling, sensation and dysthmia). The Quanteda package calculates the ratio of each category for each user. Pattern Analysis (PA)Emotional patterns and behavioural tendencies of a user is predicted by measuring emotional contrast, sentiment and social well being. In order to fully compose the PA, we merge four various types of features as follows:Emotional Tallying: To measure the quantity of the emotional score difference between patients and regular users, using the Psych’ Package in R. It is used to categorize each tweet into one of eight identified emotions. We additionally convert the eight emotions into eight Emotional Tallies.Age and Gender: As information regarding the age and gender are not provided openly, we adopted the metadata feature using R package. The distribution of age with respect to the number of people affected are analysed as shown in Fig 1. To predict the age and gender of the user, we use lexica. This feature is important and inevitable like other feature.Fig 1: Distribution of ages among all respondentsContrast Features: By utilizing the Twitter package and Quanteda Package, each tweet is categorized as either having a positive, negative or neutral attitude. To acquire the traits of each user, the contrast is changed into five various values which are Positive Quotient, Negative Quotient, Positive Correspondence, Negative Correspondence, Overturn Quotient which helps us in providing information regarding the mental stability of the users. Social Features: For conciseness, features are designed to attain a user’s interaction with other users on the online social network and how constantly they commit on Twitter. The four social features designed are Tweet recurrence, Mention Quotient, Mention recurrence, Distinct mentions.IV. RESULTS AND DISCUSSIONCommunity portals regarding Bipolar and BPDs were manually collected and begin to download thousands of followers for each community groups . Accounts relevant and matching to each psychological disorder cases were selected manually and grouped to three categories as discussed above which is shown in Table I. Random samples were collected using the Twitter REST API. Expert’s accounts are utilized in selection bias test . The random samples take the negative class in the final datasets.Table I : The cumulated number of accounts, tweets and tweets per user for various categories of usersThe performance of both the cases, Borderline Personality Disorder (BPD) and Bipolar Disorder are compared as shown in Fig 2 and Fig 3. Each arc correlates to a model analyzed on a distinct group of features (LIWC, TF-IDF, Pattern Analysis) which are described above. The y-axis represents the quota of sensitivity and the x-axis represent the quota of false alarms. Fig 2 : Execution of the Bipolar model using a unique group of features (LIWC, TF-IDF, Pattern Analysis)Fig 3 : Execution of the BPD model using a unique group of features (LIWC, TF-IDF, Pattern Analysis) The average for each case is shown in Table II. It is given that the TF-IDF model produced the greatest average of 94% for both the Bipolar and BPD cases. The Pattern Analysis feature has a lower average than the TF-IDF feature but it is moderately better than the LIWC feature.Table II : The average performance measures of the group of features (LIWC, TF-IDF, Pattern Analysis)V. CONCLUSIONIn summary, a basic data collection mechanism Crowd based profiling is proposed to collect patient and regular users datasets. Thereafter an own semantic and habitat features are gathered and adopted for the purpose of psychological disorder detection. It is concluded that to produce satisfying results, a combinational methodology of manual and automatic effort is needed. The mechanism we use make provision for more advanced research and experiments on psychological disorders using other techniques such as Linear Regression, Support Vector Machines (SVM), etc. REFERENCES Hong-Han Shuai, Chih-Ya Shen, De-Nan Yang, Yi-Feng Carol Lan and Wang-Chein Lee A Comprehensive study on Social Network Disorders Detection via Online Social Media Mining IEEE Transactions on knowledge and data engineering, vol 30, 2018. Elvis Saravia, Chun-Hao Chang, Renaud Jollet De Lorenzo and Yin-Shin Chen MIDAS – Mental Illness Detection and Analysis via Social Media International conference on advances in social networks analysis and mining (ASONAM), 2016 Kun-Lin Liu, Wu-Jun LI and Miny Guo Emoticon Smoothed Lanuage Models for Twitter Sentiment Analysis Twenty sixth AAAI Conference on Artificial Intelligence,2012 Hong-Han Shuai, Chih-Ya Shen, De-Nian Yang, Yi-Feng Lan, Wang-Chein Lee and Phlips S .Yu Mining Online Social Data for Detecting Social Network Mental Disorders Proc. Int. Conf. World Wide Web, 2016 M. Cha,H. Haddadi, F. Benevenuto, and K. P.Gummand, Measuring user influence on Twitter : The million follower fallacy, Proc. Int. AAAI Conf. Weblogs Social Media, 2010 E. Saravia, C. Argueta, and Y.-S. Chen. Emoviz: Mining the world’s in-terest through emotion analysis. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015. G. Coppersmith, M. Dredze, and C. Harman. Quantifying mental health signals in twitter In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014. C. Argueta, E. Saravia, and Y.S. Chen.Unsupervised graph based patterns extraction for emotion classification In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015. M. Park, C. Cha, and M. Cha. Depressive moods of users portrayed in twitter In Proceedings of the ACM SIGKDD Workshop on healthcare informatics (HI-KDD), 2012. G. A. C. C. T. Harman and M. H. Dredze. Measuring post traumatic stress disorder in twitter In ICWSM, 2014. G. Coppersmith, M. Dredze, C. Harman, and K. Hollingshead. From adhd to sad: Analyzing the language of mental health on twitter through self- reported diagnoses NAACL HLT, 2015.  M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz. Predicting depression via social media In ICWSM, 2013.  A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision CS224N Project Report, Stanford, 2009.