The goal of the presented method is to support a therapist in verifying the detailed hypotheses, proposed for a particular patient. Based on the literature review and our previous research, the method was proposed (Fig. 4). It consists of the following steps:
- 1.
Preparing the free statement by a patient on his/her body image.
- 2.
The sentiment analysis of the prepared statement.
- 3.
The analysis of the patient’s emotions.
- 4.
The analysis of particular areas of difficulties (from the previous authors’ paper [22]).
The first component consists of Recurrent Neural Network (RNN), pre-trained on Stanford Amazon reviews dataset [23], that allows for sentiment analysis of the patient’s body self-description (Fig. 4a). The outcome gives information about the polarity of delivered description, which can be positive, neutral, or negative. The second component is a statistical classifier [24] that performs emotion expression analysis of body description (Fig. 4b). As a result, it gives the information about the amount of happiness, anger, sadness, fear, and disgust expression exposed in body description delivered by the patient. The presented approach provides a kind of support for the existing method performed as psychological analysis, which, unfortunately, is time-consuming and prone to human error.
Further part of the paper presents the description of the proposed method for computer-aided therapeutic diagnosis of anorexia. First, the patients are asked by psychologists to write a subjective description of their body. Next, the written portrayal is converted, using Natural Language Processing, into a form of data that is interpreted by the computer for text analysis and machine learning tasks. Punctuation marks are removed. In the sentiment analysis, pathway words of the text data are then transformed into 100-dimensional numerical vectors based on The National Corpus of Polish using Word2Vec transformation. Then, the vectors are provided into the input of the Recurrent Neural Network, and description polarity is presented. The presentation of the emotion analysis pathway is as follows. Words are stemmed using the SAS Visual Text Analytics dictionary leading to separation of most frequent terms presented in body description. This step is important because some words have the same meaning but different lexical forms. These terms are provided into the statistical classifier, and as a result, the amount of certain emotions in the patient’s body description is shown.
The free statement of patients’ body image
Having considered all the obstacles the anorexia patients encounter during the medical review, the psychologist asks the patient to write a short statement about his/her body. The following criteria for including patients into the research group were adopted:
The age of the patients corresponds to the adolescence (12–18 years old),
The initial diagnosis of anorexia made by psychiatrist,
The lack of other coexisting psychical disorders,
The disorder lasts no longer than 3 years.
The following criteria for excluding the patients from the research group are:
The patients cover other age groups,
The lack of circumstances for an initial diagnosis of anorexia made by psychiatrist,
The presence of other coexisting physical disorders, for instance, depression,
The duration of the disorder is longer than 3 years.
The following criteria for including patients into the control group were adopted:
The age of the patients corresponds to the adolescence (12–18 years old),
The lack of psychical disorders.
The research material contains the collection of free statements on “The image of my body”:
The collection of 44 statements of people with the diagnosed anorexia,
The collection of 52 statements of healthy people.
The research and control group
Within the developed criteria, 44 girls with anorexia (restrictive form) were included in the research group. The participants were aged 12–18, an average of 15.1 ± 2. The anorexia was diagnosed according to ICD-10 and DSM IV criteria. The average weight of the girls was 37.2 ± 6.1 kg, and BMI was between 11.1 and 20.6, average 15.6 ± 2.4 (p < 0.001 vs. control group), and BMI SDS from − 5.2 to 0.9, average − 2.63 ± 1.28 (p < 0.001 vs. control group).
The control group consisted of 52 healthy girls, aged 12–18, average 14.9 ± 1.6, the average weight was 55.3 ± 9.8 kg and BMI from 16.1 to 24.8, average 20.1 ± 2.1, and BMI SDS from − 2.6 to 2.6, average 0.09 ± 1.16. The participants in the control group were female students of primary, middle, and secondary schools in the city of Gliwice.
The girls in the research group, with the diagnosed anorexia, had statistically lower BMI values comparing to those in the control group (p < 0001). The BMI values indicate the severe underweight of girls in the research group and the normal weight of girls in the control group. The BMI value depends on sex and changes, along with a person’s age. Therefore, for assessing the nutritional status of the girls, the normalized values were also presented using the Standard Deviation Score (SDS). According to the normalized standards for age and sex, the value below − 1 stands for underweight, values from − 1 to 1 stand for average weight, and 1–2 indicate overweight, the value more than 2 of SDS indicates severe obesity [25, 26]. BMI values for girls in the research group were statistically lower than the values of healthy girls–control group (p < 0.001), which proves the severe underweight of girls suffering from anorexia and average weight of the girls in the control group.
Sentiment analysis of the statement
The methodological approach considers the sentiment analysis as supervised or no supervised classification depending on what kind of documents, re-classified as positive and negative, are accessible. In the unsupervised approach, the dictionary methods are applied [27,28,29,30]. In the supervised approach, the algorithms of machine learning, such as artificial neural networks [31,32,33,34], Support Vector Machine [35] are used in order to find the dependencies between the features of the text excerpt and the opinion expressed in a document. An expert in a particular area must prepare the required training set.
The supervised classification is usually conducted at the level of the entire document. Lots of research indicates the high effectiveness of Bayes classifiers [36,37,38] and the Support Vector Machine [39, 40]. However, the appropriate choice of classifier input variables is still the issue to be solved. Commonly, the desired input variables are the selected terms, their weight, and normalized frequencies, tags describing part of the speech, opinion-forming words, and the occurrence of negative words.
If the sentiment analysis encounters the problems with the classification of a sequence of words with various lengths, then the RNN can be used. This approach focuses on avoiding exploding and vanishing gradients. In the first case, it can lead to the fluctuation of neural weights values during the learning phase; in the second case, to the overextended time of training with the minimum effectiveness achieved. The Long Short-Term Memory cells (LSTM), which is the most commonly used solution, is applying to avoid such problem [41].
In the proposed attitude, we decided to develop a method of sentiment analysis for the entire document using the RNN with 2 layers of Gated Recurrent Units [42]. The outline of the applied network is presented in Fig. 5. The architecture is taken from the SAS Visual Text Analytics example—Sentiment Analysis using DeepRNN [43].
The possibility of using knowledge transfer in machine learning explains why we decided to choose the presented approach. The collection of free statements on the body image was small compared to the number of parameters of the model based on Artificial Neural Network. Hence the research applied a trained model on the collection of opinions concerning the Stanford Amazon Dataset service [13] and then trained the selected model on a specific text corpus.
The Stanford Amazon review dataset contains 34,686,770 reviews that have been used for pre-training of the RNN. Reviews include product and user information, ratings, and a plaintext review. The polarity of the review was based on users’ scoring value. Reviews provided into the presented neural network were translated into Polish and transformed with Word2Vec approach so that they would correspond to the inputs of the RNN.
The Deep Learning method provided in the presented approach uses the RNN. The input layer of the network contains 100 neurons corresponding to the 100-dimension vector representation of words.
The presented network contains two hidden layers. The first layer is made up of 5 Bidirectional Gated Recurrent Units, which contain two subunits, one for forward states and another for backward states. It gives excellent results and provides a better understanding of the context. The second hidden layer contains 5 directional Gated Recurrent Units [42], which are only able to store representations of recent input events in the form of activations. The output layer provides information about sentiment values and the probability of achieved sentiment values.
The optimization tasks are performed using Adam optimizer, also called the Adaptive Moment Estimation. This method allows for efficient stochastic optimization and only requires first-order gradients with little memory requirement [44]. The optimizer computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients. Optimal parameters of the Adam optimizer were beta1 = 0.9, beta2 = 0.999 and learning rates 0.000005. We are using Categorical Cross Entropy (CCE) as common loss function in neural network classification tasks, which punishes all misclassifications equally. CCE assumes that only one class is correct and is defined as:
$$ CCE = - \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} { \log }\left( {p_{i} \left[ {y_{i} } \right]} \right), $$
(1)
where N is the size of the training dataset and \( p_{i} \left[ {y_{i} } \right] \) is the probability vector output of the network at the target index \( y_{i} \) for the \( i{\text{th}} \) example. Figure 6 presents the learning curves for training and validation data set.
Assessment of patient’s emotions
The presented approach presents the techniques of the initial processing taken from Natural Language Processing. In further analysis, the document is presented as a vector, where particular elements inform which words occur the most frequently. Forming the initial corpus of the document to the desired form requires (Fig. 7):
removing punctuation marks at beginning and compiling the occurrence a list of words, individually for each document,
removing words negligible for further analysis (those words form so-called stop-list),
transferring the words occurring in the list into their basic grammatical form (stemming).
Depending on the language specificity, word transformation into their basic form can be achieved thanks to the rules or dictionaries. The words, which remained after the initial document transformation, are labeled as terms. Here, the dictionary approach focused on the Polish language dictionary included in SAS Visual Text Analytics was used [45].
In the next phase of the analysis, Nencki Affective Word List (NAWL) [46] was used. It consists of 2902 Polish words and their ratings connected to different aspects of expressing emotions. The database is a Polish adaptation of the Berlin Affective Word List-Reloaded (BAWL-R), commonly used to investigate the affective properties of German words. Affective, normative ratings were collected from 266 Polish participants (136 women and 130 men) [46].
We developed one-level taxonomy to ensure the analysis of emotional intensity in particular statements supported by the mentioned dictionary (Fig. 8). It includes 5 groups, and each refers to the particular basic emotion: happiness, anger, sadness, fear, and disgust. The rules that determine the affiliation to the particular group of emotions are the measures of emotional sentiment taken from the NAWL dictionary. The developed taxonomy uses a dictionary approach and the statistical model presented and discussed by Ningham et al. [47].
The assessment of particular areas of difficulties
The previous research showed the relationship between the vocabulary used in the patient’s statement and the morbidity. The proposed method intends to provide additional support for the psychologist by developing 6 detailed categories corresponding to the patient’s areas of difficulties [22]:
Self-esteem: an esthetic way of perceiving the body,
Acceptance of the social assessment: the reception of the person perceiving it by society,
Emotions: experienced emotions,
Autoimmune: descriptions of aggressive and self-aggressive behaviors,
The functioning of the body: description of the functioning of the body,
Body image: the image of individual parts of the body.
The presented approach used document presentation in the form of words list, where the words occur in their basic form. The initial words processing into their basic grammatical form were achieved in the stage, referring to the patient’s emotion assessment.
The development of the detailed dictionaries was based on the sentiment, created by Wilson, Wiebe, and Hofman [48], however, the particular categories were created by psychologists—experts specializing in anorexia diagnostic and therapy [22].
The conducted experiments
The following experiments were conducted during the research:
Experiment 1 (E1)
-
1.
The sentiment label for statements included in the corpus, (presented in section “The free statement of patients’ body image”), was developed together with the psychologists.
-
2.
The selected RNN architecture was trained on the collections depicted in section “Sentiment analysis of the statement.”
-
3.
The scoring of the sentiment analysis of a statement by comparing the results from RNN and labels developed by experts.
-
4.
The obtained results were compared also to sentiment analysis ones, based on the dictionary method presented in [11].
Experiment 2 (E2)
-
1.
The statements from the corpus (see section “The free statement of patients’ body image”) were processed into their basic grammatical form according to the method depicted in section “Assessment of patient’s emotions”.
-
2.
The analysis of processed notions for occurring five basic emotions: happiness, anger, sadness, fear, and disgust was made.
Experiment three (E3)
-
1.
The statements from the corpus (see section “The free statement of patients’ body image”) were processed into their basic forms according to the method depicted in section “Assessment of patient’s emotions”.
-
2.
The dictionary analysis was performed for occurring particular areas of difficulties according to six detailed developed dictionaries (see “The assessment of particular areas of difficulties” section).
The quality evaluation
To evaluate the classification quality, the following measures were introduced:
True-positive rate (TPR): \({\text{TPR}}\;{\text{ = }}\;\frac{{{\text{TP}}}}{{{\text{TP}}\;{\text{ + }}\;{\text{FN}}}} \),
True-negative rate (TNR): \( {\text{TNR}}\;{\text{ = }}\;\frac{{{\text{TN}}}}{{{\text{TN}}\;{\text{ + }}\;{\text{FP}}}} \),
Positive predictive value (PPV): \( {\text{PPV}}\;{\text{ = }}\;\frac{{{\text{TP}}}}{{{\text{TP}}\;{\text{ + }}\;{\text{FP}}}} \),
Negative predictive value (NPV): \( {\text{NPV}}\;{\text{ = }}\;\frac{{{\text{TN}}}}{{{\text{TN}}\;{\text{ + }}\;{\text{FN}}}} \),
F1 measure: \( {\text{F1}}\;{\text{ = }}\;{\text{2}}\frac{{{\text{PPV}}\;{\text{*}}\;{\text{TPR}}}}{{{\text{PPV}}\;{\text{ + }}\;{\text{TPR}}}}\;{\text{ = }}\;\frac{{{\text{2TP}}}}{{{\text{2TP}}\;{\text{ + }}\;{\text{FP}}\;{\text{ + }}\;{\text{FN}}}}{\text{,}} \)
where
TP—true positive, the number of true-positive predictions,
TN—true negative, the number of true-negative predictions,
FP—false positive, the number of false-positive predictions,
FN—false negative, the number of false-negative predictions.
The results of the sentiment analysis were subjected to statistical evaluation. Statistical analysis of results provided the information whether the median value of expert assessment coincides with the median of the results obtained by the dictionary and RNN methods and whether the medians of both methods overlap (H0 hypothesis: equality of medians). For this purpose, the Wilcoxon matched-pairs signed-ranks test was used.