A first look at the Czech data

After data cleanup, this is a first round of network visualization and analysis.

The Czech corpus on 2022-09-07 is coded with 586 codes, giving rise to 14,455 co-occurrences. The stacked CCN has 6,397 edges.

Highest core values

47 codes have the highest core value, k = 46. They are:

codes with in the highest-K *k*-core (*k* = 46)
*healthcare system
*misunderstanding (culture or value based)
Adolf Hitler
Andrej Babiš
anti-COVID measures
anti-vaxxers
caring about future generations
conspiracy theories
crisis
DDeliteestrangement
DDpassel
Donald Trump
dystopia
Ecorrupt
Ehightax
elections
employer
Epoverty
family structures
fear
financial interests
hidden agenda
impact of COVID-19
information overflow fatigue
Joe Biden
LAinadequate
LAinsecurity
LAretirement
lockdown
media
migration
Miloš Zeman
Politicians
public interest
respirators/face masks
retired people
SAinciv
satisfaction
schools
sense of threat
SIincompetence
socialisation
uncertainty
voting
Václav Havel
young generation
Z COVID-19 - Category

Simmelian backbone extraction

A network’s Simmelian backbone is a subset of its most redundant edges. Edges are redundant when they conect two nodes that have many neighbors in common. They are used to extract community structure from dense networks, such as CCNs.

In this particular CCN, quite soon the “hairball” structure resolves into three distinct communities, two very dense smaller hairballs, one less so. Above I show a visualization for r > 30, with 108 codes and 1,581 stacked edges. The two dense communities are: one, that I will call the “southwest community” because of where it appears in the visualization above. Its component nodes are 40 with 799 edges, and include many of the codes in the highest-k K-core. Of course, if I choose a threshold value of r higher than 30, all communities will have fewer codes and edges; conversely, if I lower my threshold r they will have more. But this structure is quite stable across a good range of threshold values.

codes in the southwest community with r = 30
*healthcare system
*misunderstanding (culture or value based)
Adolf Hitler
anti-COVID measures
caring about future generations
conspiracy theories
crisis
DDeliteestrangement
Donald Trump
dystopia
Ecorrupt
Ehightax
elections
employer
Epoverty
fear
financial interests
information overflow fatigue
Joe Biden
LAinadequate
LAinsecurity
LAretirement
media
migration
Miloš Zeman
Politicians
public interest
respirators/face masks
retired people
SAinciv
satisfaction
schools
sense of threat
SIincompetence
uncertainty
voting
Václav Havel
young generation
Z COVID-19 - Category

The community of codes to the “nortthwest” has 30 codes and 398 edges.

codes in the northwest community with r = 30
*psychological well-being
anti-COVID measures
bad strategy
Clubhouse
dating (romantic)
DDpassel
Dominik Feri
EDnescience
family structures
gender role models
GENinequality
high school graduation
home office
impact of COVID-19
information flow
institutional failure
labour
meeting new people
moral integrity
online sphere
online teaching
Politicians
protests
public pressure
Roman Prymula
SAsupdef
social isolation
social media
socialisation
Work experience abroad

The two are connected by anti-COVID measures, DDpassel, family structures, impact of COVID-19, Politicians and socialization.

Finally, there is a much lesse dense “eastern” community of 30 codes, connected by 103 edges. These are:

codes in the eastern community with r = 30
children
Civic democratic party
civil disobedience
communism
confusing measures
covid conflict
CULdisorient
CULman
DDpolcor
democratic elections
drastic measures
EDprogramin
Einequality
grassroots movement
IHphysical
infection rate
Jana Bobošíková
LAunemployment
Lubomír Volný
MIGinmigra
Pirate party
populism
post-socialist transformation
sanitary-epidemiological station
SPD
staying at home
vaccinations
vaccine injury
virus testing
Volný blok
working patterns
ČSSD

The eastern community is connected to the other two via respirators/face masks, social media, online platforms, lockdown, anti-vaxxers and Andrej Babiš.

Structure: association depth

In this CCN, the association depth d and the association breadth b of edges are correlated positively, but not tightly, with a correlation coefficient of 0.53. The very deepest edge connects ``impact of COVID 19andonline teaching` (d = 68). This network connects 22 codes with the 34 deepest edges in the whole corpus. Each of them has d >= 21.

Below, a broader visualization, with 53 nodes and 99 edges with d >= 14.

Structure: association breadth

The broadest edge of all connects CULman to anti-vaxxers (b = 20). Three more edges have a b > 15, and they form a chain, connecting DDeliteestrangement to DDpolcor to Andrej Babiś to SIincompetence. Recall that b is the number of individual informants that have associated these codes at least once. The network below shows the 44 broadest edges, each supported by at least 7 informants. They connect 32 codes.

This is what we happen when we filter in edges with b >= 4. There are 59 codes, with 110 edges.

1 Like