Hello friends, after some rest and some reading and thinking I am ready to share my takehome points from the Skunkworks.
My goal, as you may recall, was to spot some low hanging fruit methodological advance in digital ethnography. I hoped to get that by starting from an empirical problem, that of European populism, and moving on from there. I came out of our two days with two promising candidates: Wirtz’s approach to finding out when informants are “sure about what they are saying”, and Beckert’s idea of “thinking the present as dependent from the future, rather from the past”.
1. Wirtz’s interview entropy
Wirtz’s paper (@sander, did he consent to us sharing it? Otherwise I cannot link it here) has a promising idea: reusing the coding of interviews (or posts in our case) for some algorithmic analysis. Its purpose, in the way that Sander presented it, is to determine which statements are delivered with confidence as opposed to only tentatively. I like this idea, and I have myself been thinking about “reliability scores” that would be attributed to associations (i.e. edges in the semantic network), rather than statements. To compute them, I was thinking of using social networks metrics.
His main tool is to confront two measures of Shannon entropy. One is the observed value, which has to do with the frequency with which each specific combination of codes occurs. The other one is the expected value, which is the frequency with which the same combinations would occur, if the codes were independently distributed.
However, after going through the paper itself, my enthusiasm has cooled considerably. Here is why, in decreasing order of importance:
-
Wirtz does not actually claim, outside of the paper’s title, that he has constructed a “coherence detector”. Instead, he makes considerably weaker claims:
The frequency of a class in a given set of consecutive statements is the number of occurrences of that class in the set. We superimpose the […] graph of this frequency on the three previous curves (observed information, expected information, and expected minus observed information). By looking at this graph we can see whether there is some degree of resemblance between the frequency time series and those of the three information statistics over part or all the interview. If there is such a correspondence for some classes, we will claim that this class carries the information during this part of the interview. [emphasis Wirtz’s]
In the final part of the paper, the author discusses his six interviews one by one, and even there he makes only cautious claims. “If we assume that the observed information betrays a temporary difficult with expression, we deduce that the present seems to pose the greatest difficulty for this man”. With a working mathematical coherence detector, we should not need to assume.
-
I am not convinced by the author’s introduction of a time dimension in the interview. His approach is to first “slice” each interview into statements (sentences, I presume). Next, he takes a “sliding window” of ten statements and assigns them a time code. Time 1 corresponds to statements 1 to 10, time 2 to statements 2 to 11, and so on. At this point, he computes information (via entropy) of the class that seems most meaningful for each window, so that both expected and observed information vary across the interview. This is not how @amelia describes coding: she treats each post as a whole, reading first the whole thing and only later starting to code statement by statement. As so often in data analysis in social sciences, Wirtz’s methodological move (just like our own) hides a non-neutral assumption about the social conditions on the ground. He assumes that interviewees are “winging it”; we, instead, take people at face value, assuming what they are actually saying is what they meant to say, and don’t try to second-guess them. Our assumption is maybe more realistic in texts that were delivered in writing, with the possibility to edit etc.; his is more realistic in texts that were delivered orally. Our assumption is also more in line with our ethical code of treating people on the platform as thinking adults, admittedly a leap of faith.
-
The method’s scalability is not demonstrated, nor exemplified. Wirtz goes through each of his six (six! We are already looking at 1,000 posts in POPREBEL) interview in painstaking detail. He does get some patterns, but his method is not more efficient than having a live ethnographer code for non-semantic codes like “assertiveness” and “hesitation”, and much less precise.
-
I am not clear how expected information in a class is computed. There is no formula; if I wanted to write some computer code implement the method, I would have to ask the author or invoking the help of someone like @melancon to help me interpret the paper.
Overall, this method is cool but it does not seem worth the investment to me. Happy to reconsider if you disagree with me.
2. Beckert’s imagined futures
I read the first (theoretical) part of Beckert’s book. It argues that
- People need some idea of future states on the world in order to make decisions in the present.
- However, we have no way to “compute” such states with any degree of certainty. Rational expectations theory is rejected, which is not hard at all. Instead, Beckert proposes the concept of fictional expectations: expectations are formed as causally credible stories about how the future follows from the present.
- Once adopted, fictional expectations inform present behavior. Therefore, who controls the narrative about the future controls the present, as speculators and central bankers know well.
This is great. But I do not see how it can contribute to expanding SSNA. It can play a role in POPREBEL, if we decide to pay attention to how informants see the future (a Europe of nations? A liberal globalist conspiracy ruling the continent? and so on). It can even help us make predictions, because fictional expectations held today will influence individual behavior (and therefore social phenomena) in the near future. Maybe we could discuss this issue with the U Tartu folks. But I do not see a core module of SSNA.
So, as far as I’m concerned, it’s back to the drawing board.
What does everyone think?