POPREBEL Coding Thread (LEGACY)

amelia · July 2, 2020, 11:59am

It seems everyone is running a bit behind, so let’s say that coding needs to be finished by the end of the day on the 6th. That gives us two full days to go through each other’s codebooks before we meet.

There is no purpose in us meeting if we have not reviewed each other’s codebooks, so please give an update on the 6th if this is not looking feasible for anyone and we will go from there. If this ends up being the case, let us know sooner rather than later so we can push back.

@Jirka_Kocian @Jan @Wojt

Jan · June 22, 2020, 9:39pm

@amelia @Jirka_Kocian @matthias @SZdenek @Wojt @Richard Dear All, as I wrote to Amelia earlier, Wojtek and I had a really great “workshop” on Friday (June 19) to develop our method of work and code a bit. We have come up with three questions:

The mother-child code relationship. Say, we code “homosexuality” and it intersects in the coded fragment with the issue of “various approaches”. Wojtek codes separately assuming that if we want to get the intersection the software will catch it and display. I code: mother: “homosexuality” and child: “homosexuality: various approaches.” What is your advice?

Wojtek observed that if, say, he codes the whole entry with CODE1 and then codes a part of it with CODE2 and then wants to adde CODE3, say, to a word, he cannot do this. The system allows only two codes “on one bit of text”, it seems.

The most urgent issue: is it possible to duplicate a threat of a conversation, so we could code it on the platform separately (also inter-coder reliability) and only then to compare our codes and reconcile them? It is - we believe - a very useful exercise, at least the beginning, to compare and try to synchronize our coding “habits.”

Yours, Jan

amelia · June 23, 2020, 8:51am

If this is just an exercise, you can copy/paste the content and create a new thread of your own as a “test”. I’d recommend @Jan being the test subject, since he assigns less codes, so it’ll be easier to transpose them onto the original thread if you want to merge them. Does this work for you?

Does @Wojt create a hierarchy in the backend, or just co-code? If the latter, that’s a good approach that would ideally be combined with yours as well. Yours works, but only if you assign parent-child codes that makes sense from an SSNA point of view. First, the top-level code will not auto-assign to the ones below it. @hugi and our team on a different project tried having this happen automatically, but it lead to a giant overproliferation of codes that rendered the SSNA meaningless. Second, “various approaches” is too vague on its own (to be honest, I don’t think the code “homosexuality:various approaches” is very descriptive either, so ideally the “lower level” code would be a more specific thing (e.g. “lesbian relationships”).

The plan (as articulated in this post, which you should both read carefully) with hierarchies is as follows:

So, in short – apply both top-level code and lower-level code if it applies, but be discerning. Use hierarchies in your own codebook, and we will then apply them together in the backend and use them to refine codes and apply top-level codes more widely if we need to – it’s very easy using the “copy” function in the backend to apply a top-level code to all instances of a lower-level code, but doing so automatically leads to too much vagueness/heavy co-occurrences on too high-level codes. If we do so more deliberately, in tandem with the SSNA creation, we get a more refined and accurate picture of the data. (as I describe this, @alberto, this might be a methodological point for us to note).

Just right click outside of the selection and you’ll be able to add more codes than 2 (we’ve discussed this on multiple occasions in the POPREBEL team and I go over it in training, but perhaps we need a more centralised documentation to refresh memories as it’s a lot of information to keep track of).

@matthias, feel free to check me on these answers

amelia · June 23, 2020, 8:53am

I have a call with @Wojt on Wednesday or Thursday (time pending) that if you can also join, would be useful to have you.

@wojt, so we pick a time more amenable to @Jan, should we say 9am EST, 2pm UK, 3PM Poland? I prefer Thursday but if Wednesday works better for @jan, I can make that work.

Wojt · June 23, 2020, 9:03pm

@amelia I’m fine with either Wednesday or Thursday, time of your choosing.
As to “various approaches” and “homosexuality: various approaches” we were referring to our codebooks only. Neither I or @Jan build hierarchies in the backend.
The question as I saw it, was more about how to highlight the relationships between various themes/codes/labels so that the system (which is at least partly automated as far as I understand, I mean the generation of the network and the graph) can see not only co-occurences, but also meaningful relationships between various codes.
This particular example pertained to one respondent noting that people exhibit various attitudes towards homosexuality,and now I’m wondering if it’s worth coding at all…
But let’s take a different, perhaps better example: which is better in the codebook and in the codes on the platform: the code “discrimination” (with “homosexuality” as its parent code in the codebook - a choice that, to my mind, would look better on the graph) or “homosexuality: discrimination”, provided that they pertain to the same fragments within the same single post? In our codebooks it probably doesn’t matter that much, since if we take care to somehow show the hierarchical structure it should be fine to later reflect that in the backend. Am I correct?

Great news!

As per the layers of coding, that is great news again! I probably won’t use it much myself, but may come in handy.

Thank you for all your help!

alberto · June 24, 2020, 9:00am

I am no ethnographer, but would argue that neither solution makes the most of our data model.

1 . Beware false hierarchies. Discrimination is definitely not a child of homosexuality. Hierarchies are meant in a proper ontological sense: cat is a child of mammal, and France is a child of Europe.
2. Double codes are bad practice. homosexuality:discrimination destroys the information that two separate experiences are mentioned in the same post. In SSNA, you are supposed to enter two separate codes, homosexuality and discrimination. The connection between the two is meant to be emergent: if (and only if) many posts mention both experiences, the semantic graph will show a strong connection. Formally, connections between experiences or concepts are represented by edges, not collapsed into nodes.

The advantage of coding thus is large. Somewhere else in the conversation you could have discrimination associated to something else, say age. The ego network of discrimination will then show you the various aspects of discrimination in the corpus. With double coding, you have to fall back on human memory to rebuild it. “Wait, did I not see discrimination elsewhere?” With large corpora and several people coding, this problem becomes more severe. SSNA’s main advantage is scalability, so it is essential to code for that. Your codes are no longer a memory aid for you, but a memory aid for the collective effort of several researchers plus a computer system.

@amelia, what’s your take here?

Wojt · June 24, 2020, 9:58am

Thank you, @alberto!

alberto · June 24, 2020, 10:21am

My pleasure, @Wojt .

There is an additional point that I did not mention: turns out that rich posts are… rich, with 10-20 annotations. Picture them as points in the highly multidimensional space of human experience, for example a reported episode of workplace discrimination based on sexual orientation in Warsaw. You have two choices:

either collapse them into a n–multiple code, like homosexuality:discrimination:workplace:large_cities:poland
or attribute each of the n codes separately.

The latter solution will show up as an n-clique in the semantic graph, with all codes connected to each other. This is how the SSN knows that they all occurred together. You do not lose that information. At the same time you are protected from having to carry the logic of multiple codes to a rather silly conclusion. And you can reach the post from any of the codes: “let’s see what happens in the part of the convo that describes Poland”, or “hmm, that about the LGBT+ community?”

SSNs are quite elegant objects, you see. As you get the knack of them, you will find yourself picturing drawing edges as you code: “wow, I have seen discrimination a few posts back, but I don’t think it ever co-occurred with homosexuality yet!”. That sort of stuff.

alberto · June 24, 2020, 12:21pm

Heads up, everyone: moving this out to the public workspace. There seems to be no reason for keeping it secret, in fact it is super-interesting methodological debate!

amelia · June 25, 2020, 11:16am

@alberto is right (you have learned well, young padawan.) @Wojt and I just had a call to clarify.

I’m making a wiki now so that we have a centralised list of coding conventions in POPREBEL (and in general, an updated centralised list of best practices for coding in SSNA). Will link when it’s finished.

alberto · June 25, 2020, 12:10pm

I was going to suggest that!

Wojt · June 30, 2020, 8:41am

Hi! @amelia
Yesterday Jan and I had a meeting and we came up with a few questions.

Say, we have a code stigmatisation, an example of which may be labelling, and we encounter a text where a set of labels is listed. So a person mentions that people call her a feminazi, a dike, etc. Is it ok to code every label? I know we should rather avoid coding word-for-word, but some of them, like Polish “pisior” (a Law and Justice hardcore supportrer) wil keep popping up and we think are worth coding.
Jan came up with the idea to introduce the code constructing_common_identity with children such as: nation, religious community, Europeans, etc. I have already noticed the issues of common identity being raised here and there, people speaking of the need for finding, (re) building it, complaining of it being dismantled, so I think it might be worthwhile to have it.
Matters related to inclusion/exclusion dychotomy seem to play a central role in our investigations. Have you guys come across any mentions of these two? How do you code them?
We have a story about living libraries, and we’d like to code them under countering_exclusion → learning/teaching pluralism → living libraries. Does that sound ok?
Jan and I think that having two codes positive_sentiment and negative_sentiment to code for positive feelings/attitude and negative feelings/attitude towards something respectively might be of use. Do you code for emotions? Does it make sense at all to put these into backend (I have my doubts since their proliferation may distort the SSNA graphical output, yet they seem to be ethnographically valuable). Please advise.
Polishness is described using various labels (provincial, religious, tolerant, etc.). How can we code a phenomenon that is being defined by means of listing its characteristics.
One last thing. When I put my codes in the backend they sometimes merge automatically with already pre-existing codes by other annotators, sometimes they don’t (for example my homophobia, or country_comparison are now under Amelia’s, while my code church stays separate). It’s not a problem now, but may somehow make adding tranlsations more difficult. Will see. Just saying

Ok, that’s it for now. I’m getting back to coding, I have another meeting with Jan scheduled for today. I keep working on refining my codebook and writing definitions for the codes I decided to keep.

@jan did I put it right?

Have a great day, everybody!

Jan · June 30, 2020, 4:49pm

@amelia and @Wojt Can we, please, meet on Thursday? I am totally flexible, but to start planning: how about 15:00 in England (10:00 am here and 16:00 for Wojtek)? We are making progress, but have questions (see also above → Wojtek’s last entry). Jan

amelia · June 30, 2020, 5:01pm

I would code this, yes, and code it with double quotations to indicate that it is an invivo code (as is the convention we have agreed upon, see the wiki). I would assume this occurs in a sentence, so attach the code to the sentence and not just the word.

Use the code “constructing common identity” – that is an excellent code. But it is not a parent code. Just code it along with the other codes, and it will appear as a co-occurrence in the graph.

If you look at my codebook, I have quite a few codes around this concept. Take a look in the “hierarchies” tab and see if any work for you. And please leave a note if they are closely connected to any of yours.

Your codes shouldn’t have these underscores – just use a space. Also, are the arrows supposed to be designating hierarchies? Don’t use hierarchies, just code the content with each of those codes. Then you will get a co-occurrence emerging in the graph. And feel free to then use them as categories in your own codebook, which we can then discuss later (see the wiki on hierarchies).

If you look at my codebook, you can see that I have an entire category of codes with different emotions in them. If you are coding particular emotions that informants are expressing, feel free to use that emotion as a code. But “positive” and “negative” are too vague / non-semantic and don’t belong in the SSNA. @hugi has a way of designating “non-SSNA” codes, but I’m not sure yet if we can implement it in POPREBEL.

That’s an analytical/interpretive question that you and Jan need to try to approach using the codes! I personally would create a code 'Polishness' (using the single quotes to denote an informant concept, see the wiki) and then co-code with the other concept labels that appear (so, provincial or religious ). Then you’ll end up with a nice co-occurrence network of all the concepts that your informants associate with 'Polishness'

I’m happy to meet with you and @Jan, but you need to first engage properly with my codes and @Jirka_Kocian’s codes (by using the comment function, as described in the wiki), and read the wiki in detail, because this will answer a lot of your questions and give us more specifics to talk about. So if you can commit to doing this before Thursday, we can talk then. Otherwise, we can speak next week.

Jan · June 30, 2020, 5:21pm

@amelia and @Wojt We got it. Wojtek and I just finished talking. Here is our plan: the two of us will meet again on Thursday to further discuss and coordinate Polish codes (after studying your wiki). On Saturday, we will meet to further discuss your codebook, the wiki, etc, and start coordinating the codes. Next week , we will be ready to meet with you and the rest of the team. How about Thursday, July 9? (Please accept our apology for some slowdown and delay: Wojtek is now on a short vacation with his family and I have tomorrow (Wednesday) the wedding of our son and on Sunday the whole family is going to the beach until Wednesday evening). J

amelia · June 30, 2020, 6:48pm

That sounds great. @Jirka_Kocian and @SZdenek, can you make a call on the 9th? I’m flexible time-wise on that date at the moment.

Wojt · June 30, 2020, 7:44pm

Hi, the thing is that a number of them occur in one sentence. I did not specify that, I’m sorry.

We were planning on using it in our codebooks. Is it ok?

Will do, already started looking through it.

Yes, the arrows were supposed to present categories and again, just for the codebook. We remember not to create any in backend.

Ok, will look for those!

In the backend, yes. But how would you present it in the codebook? These codes are not really members or examples of Polishness (these words have much broader meaning, obviously). Should they remain independent in our codebooks?
I apologize for my lack of precision on these matters. I should’ve specified at each instance whether I meant our codebook or backend.

amelia · July 2, 2020, 12:02pm

So code the same sentence with multiple codes.

I don’t see why not. Check if others have used similar codes, then either choose to use theirs, or use yours and mark in their codebook that you think yours might be a better fit (or just mark the connection).

Hierarchies in the backend and categories in the codebook should be thought of in the same way. Think of your codebook as a “staging” for the hierarchies you’d like to implement in the backend. So the same logic applies to both.

amelia · July 2, 2020, 12:05pm

@SZdenek @Richard

Repeating this so everyone sees it. We will meet on the 9th if everyone is ready. If not, we will find another time the week after. Please update ASAP when you know if you’ll complete the work on time (with quality – if it’s going to be low quality because you’ve rushed it, just say so and we will push back. Better to work with good data.)

Wojt · July 2, 2020, 3:49pm

Roger that. Thank you for your help and patience