Similarly to what happened with the German corpus, the Czech one also is suffering from similar issues. @SantosCardonaPR and @Jirka_Kocian are aware of them. This post is to provide them with materials aimed at making the cleanup faster.
First of all, this spreadsheet contains all codes used in the Czech corpus, arranged in ascending alphabetical order of the English label. The rightmost column is the live link to the code’s page on Edgeryders’s back end, so you can quickly go from the spreadsheet to the back end.
Duplicate codes are a common error. They happen when an ethnographer generates his own new code, unaware that an identical one was already used for the same corpus. If you scroll throught the spreadsheet, you will find them easily, as in this example:
Here are the 33 codes, used for annotating #ethno-rebelpop-czech-interviews, that are synonimous with other codes (synonimous = same English label, ignoring capitalization).
anti-vaxxers caring about future generations community culdisorient culman czechs donald trump economy eeconomicviol envenergy envgenprob eu europe facebook gender role models gensexism homophobia impact of covid-19 morals new routine online platforms polpol populism russia social activism social distancing social isolation social media staying at home viktor orban volunteering xenophobia young generation
Some codes have no English label. In most of them the English label was recorded in the database field of the label in another language, normally (in this corpus) Czech, but occasionally others. In the spreadsheet linked above, you can find them quickly sorting for the
name_en column. There are only 33 such codes.