Methodology for the analysis of Ukrainian segment of social media and messengers
Methodology for the analysis of Ukrainian segment of social media and messengers
Українською читайте тут.
Where do we get the data from?
Detector Media respects the principles of privacy and security of personal data on social media. Therefore, we take the public data for the analysis, i.e., the one that user has allowed to be collected and processed. Each social network we analyze (Facebook, Twitter, Telegram, YouTube) has its own policies on obtaining, processing, and storing the data. Detector Media has taken into account each network's policies and considered European Union's legislation on personal data protection - GDPR. The data providers are the social media themselves or companies certified by them.
By the Ukrainian segment of such social media as Facebook, Twitter, and Telegram, we describe posts of profiles, pages, groups, and channels located in Ukraine or those who have indicated their location as in Ukraine.
Types of data processed by Detector Media:
- texts of public posts and replies to them;
- information on the time of the publication of posts and replies to them;
- number of likes and shares of the posts and replies to them;
- titles of pages - authors of posts and replies to them;
- number and list of subscribers and pages’ subscriptions.
Types of data processed by Detector Media:
- texts of public posts and comments to them;
- information on the time of publication of posts and replies;
- the number and type of interaction with the post (preferences, distribution, follow the link);
- names of groups and pages - authors of posts and comments;
- information about open groups (date of creation, whether the page from which it is administered has been changed);
- number and list of subscribers.
Telegram
Types of data processed by Detector Media:
- texts of Telegram channels posts and comments to them;
- information on the time of publication of posts and comments;
- information about Telegram channels (date of creation, number of subscribers, and country affiliation);
- information about the distribution and mention of the message by another Telegram channel.
YouTube
Types of data processed by Detector Media:
- auto-generated video subtitles;
- information about the video (creation date, title, description, number of subscribers, number of views, number of likes);
- information about YouTube channels (creation date, number of subscribers, number of uploaded videos, number of views).
How do we process data?
Detector Media analyses textual and quantitative data using libraries for statistical analysis, natural language processing, and machine learning based on the Python programming language. More details on the types of our analysis:
- n-gram analysis: automated identification and collection of the most popular words and phrases in the texts;
- text tone analysis - automatic determination of positive, negative, or neutral tone of the messages;
- topics modeling - automatic definition of topics mentioned in posts. Topic modeling allows to obtain general information about the content in the body of the documents. It works on the assumption that documents consist of a number of topics, and topics consist of words/phrases that are often found next to each other. Since the algorithm does not create names for such topics, the analysts do so manually after automated generation;
- recognition of the named entities consists of separating proper names (people, organizations, and locations) from texts. At the first stage, the algorithm automatically finds references to proper names, categorizes them, and, if possible, determines the tone (the attitude of the post’s author to the chosen noun). In the second stage, analysts manually supplement the dictionary of proper names so that in the future, the algorithm can automatically “normalize” them (i.e., determine that, for example, “SBU” and “Security Service of Ukraine” have the same meaning);
- relationship and network analysis - building a network of relationships between users and posts on social media. This allows us to identify user groups, possible bot networks, and more.
The general analysis algorithm is as follows: first, the data set is processed via computerised methods that help generalise large data sets. This helps to identify trends, patterns, and correlations so that the analyst can further purposefully explore specific aspects relevant to the subject of the study.
How do we identify the influences of hostile information in social media?
Approach #1. Detecting the activity of inauthentic coordinated behavior, i.e., bots that promote consonant messages.
Approach #2. Establishing the relationship and connection analysis between users, groups, and channels.
Approach #3. Labeling sources. For example, SBU has published a list of Telegram channels administered by the General Directorate of the General Staff of the Armed Forces of Russian Federation.
Approach #4. Verifying the allegations for veracity.
Approach #5. Comparing messages to sound in accord with / similar to the Kremlin’s propaganda disinformation narratives.
Such approaches are not mutually exclusive but rather complementary. The combination of approaches helps us more effectively identify and argue regarding the existing influences of hostile information in Ukrainian segment of social media.