Instituto Mises Brazil - An Editorial Analysis Using Natural Language Processing
2019 Dec 10This post will be the first in a small series where I conducted some analyses on the editorial shift of the Instituto Mises Brazil using some Natural Language Processing (NLP) and Latent Dirichlet Allocation (LDA). This initial text is just a description of the repository where the data and scripts are stored. Everything is free and can be copied and contested without any kind of restriction.
The repository address is: https://github.com/fclesio/mises-brasil-nlp/
…
Do you know when something changes in the editorial staff of a newspaper, magazine, or any media outlet, but you don’t know what it is? Well, I had the same doubt and decided to use some tools to validate whether there was a change or not.
Background
For some time now, I have been following Brazilian politics from the perspective of essayists from branches linked to libertarianism, anarcho-capitalism, secession, self-ownership, and related subjects; and one fact that caught my attention was the editorial shift that is slowly happening in one of the main liberal Think Tanks in Brazil, which is the Instituto Mises Brazil (IMB).
For those who don’t know, in mid-2015 there was a split in the core of the IMB where on one side was the President of the IMB (Hélio Beltrão) and on the other were the Chiocca Brothers (Fernando, Cristiano, Roberto) who subsequently created the Rothbard Institute. The reason for this split was due to disagreements regarding articles linked to secession.
And due to this split process, I think there was this transition of the IMB towards a lighter editorial line regarding subjects linked to freedom, which contradicts the ideas of Ludwig von Mises himself.
What is the reason for this repository?
The main reason is to perform a simple data analysis using Natural Language Processing (NLP) on all texts from Mises Brazil to validate a hypothesis, which is:
- Hypothesis [0]: There was an editorial shift at Instituto Mises Brazil where subjects linked to austro-libertarianism, freedom, ethics, and secession and other related issues gave way to ephemeral themes such as financialism, bureaucracy, and especially politics.
If the answer to H0 is positive, I will try to find the answers to the following questions:
- Question [1]: If H0 is true, are subjects linked to austro-libertarianism such as praxeology, the end of statism, argumentative ethics, and secession being sidelined in editorial terms?
- Question [2]: Is Instituto Mises Brazil becoming, in editorial terms, more mainstream-liberal than libertarian?
- Question [3]: Was there a change regarding the group of subjects treated over time, as well as a change in the spectrum of subjects from the present columnists?
Preparation
All of this was generated on a MacMini with Python 3.6, but it can also be executed on computers with Linux with the pre-installation of the following libraries:
$ pip install numpy==1.17.2
$ pip install pandas==0.25.1
$ pip install requests==2.22.0
$ pip install spacy==2.2.1
$ pip install beautifulsoup4==4.8.1
$ pip install bs4==0.0.1
$ python -m spacy download pt_core_news_sm
Honestly: Use R for generating your own charts. I love Seaborn and Matplotlib for generating charts, but in this sense, R is much more flexible and needs much less “hacking” to make things look cool.
Data Extraction
The database in the repository was generated on 10.16.2019 for the purpose of freezing the analysis and giving it a higher degree of replicability.
The extraction fetches all texts, regardless of whether they are blog articles or main page posts. This occurs because there is no division of URLs that makes this distinction, and sometimes we have blog articles that become posts on the main page.
Another point that should be mentioned is that Leandro Roque is the main translator/essayist for the site and some translation posts are signed by him (which is correct). This leads to two effects: 1) he is very prolific with the flow of articles on the site and this definitely distorts his individual statistics as an essayist and 2) because of the translations he has a much broader spectrum of subjects than other authors, and this must be considered when we analyze the subjects he writes about most. Personally, I would exclude him from all analyses given these two points. But that’s up to each person.
For those who want to generate a new database with data up to the current date, just run the command below in the terminal:
$ python3 data-extraction.py
At the end of the execution, the following information will appear:
Fetching Time: 00:16:52
Articles fetched: 2855
General Notices
This analysis is for educational purposes only. It is obvious that an editorial analysis involving linguistic/semantic issues is something very complex even for us humans, and to suggest that a machine can do this is something that doesn’t make much sense given the nature of the complexity of language and its nuances.
This repository, as well as the analysis, has no pretension of being “scientific”. This means there will be no elements of cognitive linguistics, computational linguistics, discourse analysis, or similar sciences. This repository brings many personal views and observations that may use some data and some scripts.
Distribution and Uses
All data, scripts, and charts can be used freely without any kind of restriction. If you can help, make a hyperlink to my site/blog or cite it academically, which will help a lot.
I do not own the rights to the texts of the Instituto Mises Brazil and here is just a compilation of the data extracted from the site, which is public data and can be extracted by anyone.
Warranties, Errors, and the Like
There is no warranty in these analyses, charts, data, and scripts, and use is at the risk of the user. There will be many errors (mainly grammatical, syntactic, and semantic) and as they happen, you can open a Pull Request or send me an email and I will adjust them. However, as I write at the speed of my thoughts, the system responsible for syntactic correction will not always work well.