Using the edgeryders.eu APIs

community_wiki · December 25, 2017, 7:29pm

This topic is a linked part of a larger work: “Discourse Admin Manual”

Content

1. Overview

1.1. API overview
1.2. Python library
1.3. Tips and tricks

2. API access

3. Custom API endpoints

3.1. Ethnographic projects
3.2. Ethnographic codes
3.3. Ethnographic annotations
3.4. Ethical consent data
3.5. Multisite account creation
3.6. Obtaining an API key by API

4. API client registry

1. Overview

1.1. API Overview

All relevant information in the edgeryders.eu database is accessible via APIs. We support the following APIs:

Public Discourse API. This API gives access to everything that can be viewed as non-logged-in visitor on edgeryders.eu. No API key or user account is required. For the API documentation, see docs.discourse.org.
Protected Discourse API. All Discourse content that can only be viewed with a user account requires an API key to access it. See below on how to get an API key for your user account. It will give you access to what your user has access to. For example, access to some protected categories, while a moderator user’s API key is required for example to get access to user e-mail addresses etc… For the API documentation, again see docs.discourse.org.
Protected custom API. This API is custom-made for edgeryders.eu and gives access to “secondary content” (the codes and annotations of Open Ethnographer), and to the data collected with our “ethical consent funnel” form. Access requires a user with access to this data, and a Discourse Admin API key for that user. For details, see section 2. API access.

1.2. Python library

We wrote a small Python module containing a collection of functions to fetch Edgeryders content, consent data and Open Ethnographer codes and annotations via API. See:

Github: repository albertocottica/discourse-social-networks, file code/python scripts/z_discourse_API_functions.py

1.3. Tips and tricks

API rate limiting. API access is throttled to avoid server overload. The limits are set in config/discourse.conf and are currently, for both our edgeryders.eu and multisite Discourse installations:

max_admin_api_reqs_per_key_per_minute = 300 (default was 60)
This is the only relevant limit for scripts accessing the API (as they use an Admin API key). Note that this limit is per API key per minute, summing together the requests to different endpoints that your script may make in that minute.
max_user_api_reqs_per_minute = 20 (default; only relevant for connected apps)
max_user_api_reqs_per_day = 2880 (default; only relevant for connected apps)
max_reqs_per_ip_mode = none (means, no additional global rate limit restrictions; default was block)

The best strategy for long-running scripts is to be proactive: calculate the allowable time for the next request to avoid your script from hitting rate limits. That spaces out requests equally over time, avoiding spikes in the server load. For information on the meaning of the different rate limits, see here.

After adapting any of the limits above, you have to restart the relevant Discourse process with, for example, sudo monit restart puma_discourse_production.

Filtering by category with the API. In Edgeryders, we often want to look at content in a specific category. Discourse supports two levels of categories: top-level and sub-level categories. When retrieving the content of a category, the API will by default also include all topics of all its subcategories. If you want to exclude some subcategories (for example, you might want “everything in the category except what is in the Workspace subcategory”), you need to do as follows:

In a browser, visit the subcategory you want to exclude. For example, category “Documentation & Support → Collaboration” by visiting this links, as seen on the homepage: https://edgeryders.eu/c/docs/collaboration/269
Locate the last number in the category URL, here 269. This integer is the category ID.
Adapt your code accordingly to exclude content of the subcategory. For example, if you are trying to count the topics in a certain category, except those in the subcategory you want to exclude, you can do something like this:
```
# 269 is the category ID of "Documentation & Support → Collaboration"
if topic['category–id'] != 269:
  number_of_topics += 1
```

As an alternative to excluding subcategories you don’t want, you can also add up topics from all subcategories you want and then add those from the top-level categories you want (while ignoring topics in sub-categories altogether). The latter is possible by appending /none to the URL of a top-level category, for example https://edgeryders.eu/c/docs/12/none.

2. API access

Required credentials. For access to the edgeryders.eu APIs, you need the following:

Public Discourse API. You do not need an API key for the public Discourse API, see above.
Protected Discourse API. You need either a Discourse User API key (which you can generate yourself) or a Discourse Admin API key, and then can access with that API key everything that the associated user can access.
Protected Custom API: ethical consent data. You need a Discourse Admin API key of a Discourse moderator or admin user.
Protected Custom API: Open Ethnographer data. You need a Discourse Admin API key of any user who has access to Open Ethnographer.

If you do need a Discourse Admin API key, ask @alberto or @matthias (or another edgeryders.eu Discourse admin) to create one for you. You can not create it yourself, and you should save it somewhere because you can not look it up in your Discourse account later.

Admin API key creation. For Discourse admins, this is how you create Discourse Admin API keys:

Go to “Admin → Users” and select the target user.
Scroll down to find “Permissions” and under it “API key”.
Click the Generate Key button.
Tell the user their API key via a secure channel (for example, an encrypted Matrix chat).

Admin API key usage. Once you have your Discourse Admin API key, you have to supply it in a HTTP header Api-Key:. This mechanism works for both the standard Discourse API and our custom endpoints. Notably, supplying the API key as a GET parameter does no longer work since 2020-04 (source)! Using the Api-Username: is not necessary as it’s only used to identify the user to impersonate when using master API keys. See also the official Discourse API access documentation. A working minimal command line example is as follows, to be used with your own Admin API key:

curl -X GET "https://edgeryders.eu/annotator/{project_id}/codes.json" -H "Api-Key: 1fe3…"

3. Custom API endpoints

3.1. Ethnographic projects

The Open Ethnographer projects API endpoint is accessible at https://edgeryders.eu/annotator/projects.json.

A standard response looks like this:

{
    "id": 3,
    "name": "ethno-ngi-forward",
    "created_at": "2022-11-29T11:31:22.784Z",
    "updated_at": "2022-11-29T11:31:22.784Z",
    "codes_count": 1131
}

3.2. Ethnographic codes

The Open Ethnographer codes API endpoint is accessible at https://edgeryders.eu/annotator/projects/{project_id}/codes.json.

A standard response looks like this:

{
    "id": 13,
    "description": null,
    "creator_id": 3323,
    "created_at": "2017-09-05T16:09:52.870Z",
    "updated_at": "2017-09-05T16:09:52.870Z",
    "ancestry": null,
    "annotations_count": 1,
    "names": [
      { "name": "accessible laboratories", "locale": "en" },
      { "name": "zugängliche Laboratorien", "locale": "de" }
    ]
}

Open Ethnographer supports code hierarchies. The ancestry field returns the parent code of the code at hand.

Codes can be filtered by creator like this:

https://edgeryders.eu/annotator/projects/{project_id}/codes.json?creator_id=3323

The output is paginated. By default, one page contains at most 100 annotations. This can be changed with the per_page GET parameter:

https://edgeryders.eu/annotator/projects/{project_id}/codes.json?per_page=200

If there are more annotations available than the given per_page limit they can be accessed on subsequent pages by using the page GET parameter:

https://edgeryders.eu/annotator/projects/{project_id}/codes.json?page=2

If no further annotations are available, an empty array is returned for that page.

You can also request info about a single code with its ID like this:

https://edgeryders.eu/annotator/projects/{project_id}/codes/13.json

3.3. Ethnographic annotations

The annotations endpoint is accessible at https://edgeryders.eu/annotator/projects/{project_id}/annotations.json

A standard response looks like this:

[
  {
    "id": 5579,
    "version": "v1.0",  (Annotator schema version.)
    "text": null,  (A comment field that ethnographer can use to explain their thinking behind the annotation.)
    "quote": "comfortable for the patient.",
    "uri": "/post/33751",  (Used by Discourse Annotator to load annotation data.)
    "created_at": "2017-05-22T19:13:10.000Z",
    "updated_at": "2017-05-22T19:13:10.000Z",
    "code_id": 342,
    "post_id": 33751,
    "creator_id": 3323
    "shape": null,  (Shape of the annotation like 'rect'. Image or video annotations only.)
    "units": "pixel",  (Image or video annotations only.)
    "geometry": null,  (Annotations geometry. Image or video annotations only.)
    "src": null,  (Path to the annotated file. Image or video annotations only.)
    "ext": null,  (File extension. Image or video annotations only.)
    "container": null,  (Video annotations only.)
    "start": null,  (Point in time when the annotations starts. Video annotations only.)
    "end": null,  (Point in time when the annotations ends. Video annotations only.)
    "topic_id": 15866,
    "revision_number": null,  (Discourse post revision-number. All annotations that belong to the same post reference the same post revision.)
    "post_creator_id": 7427
  },
 ...
]

The following GET parameters can be used to filter annotations in the response:

topic_id: Return annotations which belong to the given topic.
post_id: Return annotations which belong to the given post.
creator_id: Return annotations which were created by the given user.
tag: Return annotations which are tagged with the given Discourse tag. The tag’s name, rather than the tag’s ID, needs to be passed in. A list of available discourse tags can be found here: Edgeryders
code_id: Return annotations which are tagged with the given Open Ethnographer code.

Filter parameters can be combined as needed. For example, to return all annotations that belong to a certain topic and were created by a specific user:

https://edgeryders.eu/annotator/projects/{project_id}/annotations.json?topic_id=111&creator_id=222

The output is paginated. By default, one page contains at most 100 annotations. This can be changed with the per_page GET parameter:

https://edgeryders.eu/annotator/projects/{project_id}/annotations.json?per_page=200

If there are more annotations available than the given per_page limit they can be accessed on subsequent pages by using the page GET parameter:

https://edgeryders.eu/annotator/projects/{project_id}/annotations.json?page=2

If no further annotations are available, an empty array is returned for that page.

3.4. Ethical consent data

The Edgeryders platform has a feature called the ethical consent funnel. It is accessible by API as described below, and (after fixing #194) on the user admin pages as field edgeryders_consent.

When a user tries to post for the first time, the ethical consent funnel is served as a popup form: it asks users to answer some questions before they are able to post in certain categories. They can only proceed past the form when they have answered its questions correctly. When a user answers the questions correctly, the platform updates the value of a field called edgeryders_consent. We interpret this sequence of events as having given informed consent to participating in a research project with Edgeryders, and having understood the nature of their part in the exercise.

Question definitions. The wording of the consent funnel questions and answers is contained in consent.hbs.

Data access by API. The field edgeryders_consent is accessible by JSON API at:

https://edgeryders.eu/admin/consent.json in conjunction with a suitable API key, which you can supply as an api_key GET parameter. In addition, currently you have to supply a request header Accept: application/json as a workaround for #212. Also note that this API endpoint does not support pagination – it provides the consent data for all users in one response.
https://edgeryders.eu/u/{username}.json in conjunction with a suitable API key, which you can supply as an api_key GET parameter. Without API key, the hash “user → custom_fields” will be empty because that data is access protected. With an API key, you will find a hash key “user → custom_fields → edgeryders_consent” with a value. If you want to obtain the consent information of a larger (>20) number of users, use the consent.json endpoint instead (see above) because it will reduce the server load and script runtime considerably than making one request per user.

Basic values. The interpretation of the edgeryders_consent field values as seen by JSON API access is as follows:

"edgeryders_consent": "1": User has given consent. Includes valid consent given on the Drupal platform that was later imported (with the consent timestamp reflecting the import time).
"edgeryders_consent": null: User has not gone through the consent funnel yet. This is true for many of the earlier users of Edgeryders, as the consent funnel was only fully implemented in July 2017.

(In the Discourse database, the value is not indeed NULL, but a logical equivalent: the record in table user_custom_field will be missing for this user and name = edgeryders_consent.)

Additional values. In the Fall 2017 we attempted to elicit consensus from 191 users that had contributed to OpenCare before 2017 (but never after). For this, we created two new values for the edgeryders_consent field:

"edgeryders_consent": "0": User was re-contacted after contributing to Edgeryders before July 2017, and has denied consent.
"edgeryders_consent": "no answer": User user was re-contacted after contributing to Edgeryders pre-July 2017, but failed to answer after repeated attempts.
"edgeryders_consent": "unreachable": User user was re-contacted after contributing to Edgeryders before July 2017, but the e-mail they used to create the Edgeryders account is no longer active.

Known issue. For some users, the value of "edgeryders_consent" is not a simple string but a list:

{
   "user: {
   [...]
   "custom_fields": {
      "edgeryders_consent": [
            0: "1";
            1: "1"
            ]
       }
    }
}

We attribute this to a simple mistake:

It probably means that Discourse treats everything as a multi-value field by default. So you’d keep the last (most recent) value, and ignore the rest.

3.5. Multisite account creation

This allows authorized external applications, including JavaScript web applications, to create active Discourse accounts by API, which can then be used by these applications to post content to Discourse in the name of the new user. This is useful for various onboarding, survey and data collection purposes.

Endpoint URL. Since we use a single-sign-on (SSO) system where one Discourse account works on all of our Communities sites (see the top right menu!), the endpoint is provided on our login site because that is the SSO provider where the “master” record of each account is created:
https://communities.edgeryders.eu/multisite_account.json

Only a HTTPS endpoint is provided, no HTTP version.

Request type. GET (Will later be changed to POST once we figured out how to set up the right CORS policy in Discourse. GET is not ideal as it should not be used for requests that cause state changes.)

Parameters. The possible request parameters are:

email: Required. The e-mail address that the user wants associated with the new account. If an account with that e-mail address already exists, account creation will fail.
username: Required. The username to use for the new account. If an account with that username already exists, account creation will fail. Due to that, it is advisable to test before for username availability using the Discourse public API (https://communities.edgeryders.eu/u/username.json), or to auto-generate a username that will most likely not exist yet. The user may change it later inside Discourse.
password: Required. The password to set for the new account that will be created.
accepted_gtc: Optional. true or false, referring to the GTCs of the Edgeryders Communities platforms. Assumed as false when not provided, which will result in an error message.
accepted_privacy_policy: Optional. true or false, referring to the Privacy Policy of the Edgeryders Communities platforms. Assumed as false when not provided, which will result in an error message.
edgeryders_research_consent: Optional. true if the user passed the Edgeryders Consent Funnel or equivalent questions, giving informed consent to the use of their content for research; false otherwise. Assumed as false when not provided. true is required only when requesting an API key for edgeryders.eu.
requested_api_keys: Required. Non-empty list of the domains of Edgeryders Communities sites for which the caller requests a Discourse Admin API key. Separate multiple values with whitespace.
auth_key: Required. A shared secret without which access to this API endpoint will be prohibited. The currently active auth_key is available in a protected page.

Background info (click to unfold)

Since this system should result in published content before the user had to confirm their e-mail address, it is a good target for spam submissions and needs some form of authentication. auth_key is a shared secret to prevent automated spam submissions. The idea is to distribute it in a limited way and only to trustable parties, and to disable it once it has reached untrusted parties who start using it for spam submissions. This seems to be the best approach, as an external web application cannot be trusted when it says “I have let the user go through a good captcha” and as we don’t want to build a captcha-via-API system (and captchas are annoying anyway).

Example request. https://communities.edgeryders.eu/multisite_account.json?email=testuser2@example.com&username=testuser2&password=verysecretpassword123&accepted_gtc=true&accepted_privacy_policy=true&edgeryders_research_consent=true&requested_api_keys=edgeryders.eu&auth_key=8342……3274

Response. A typical response for successfully creating an account would look like this, basically an abridged version of the user records that are normally returned by Discourse:

{
  "id": 5,
  "username": "new-username",
  "email": "username@example.com",
  "active": false,
  "created_at": "2019-09-05T08:35:01.000Z",
  "username_lower": "new-username",
  "trust_level": 0,
  "api_keys": [
    { "site": "edgeryders.eu", "key": "sgev47…fdffd0" }
  ]
}

As seen from the "active": false field, the account on communities.edgeryders.eu is not yet active. When the user tries to log in for the first time, she will be asked to activate the account by requesting a link sent to her e-mail address and clicking on it. However, the accounts created on Communities sites alongside this master account are already set to active. This allows to use the returned Admin API keys without further steps and also makes sure the user is notified by e-mail about replies to the content she posted. The fact that the Communities account(s) are already active includes a slight risk that e-mails are sent to other people’s e-mail addresses in case the user did not enter her own e-mail address. If that turns out to be a problem at least for some applications, we can provide a parameter that allows to switch this behavior on or off, and creating active accounts would only be possible with some but not all auth_keys.

In the case of an error, the error status and message as provided by Discourse for an account creation error will be returned. This happens when:

An API endpoint parameter is missing or wrong. See the source code for details on the possible error messages that can happen.
An account with the given e-mail address already exists.
Other Discourse account creation errors.

If you receive a “500 Server Error” response, it will be due to this open issue.

Typical usage. A typical usage of this API endpoint would look like this:

Content preparation. A client application of this API would first collect the content or data it wants to post to Discourse under a user’s name, and compile it into a Discourse post format.
E-mail address input and consent. On the last screen of the content collection form, the user is asked to enter an e-mail address and to confirm to the Edgeryders ethical consent funnel, terms and conditions and privacy policy.
Get the auth_key. The application needs a valid auth_key vale to be granted access to the multisite_account.json API endpoint. This can come from several sources. In the simplest case, it would be written next to the screen on a survey computer and the user would enter it into the application. Or it can be provided as a GET parameter in a link (from social media etc.). Or it can be served from a configuration file that is publicly accessible to the JavaScript application from a server but not included in a source code repository.
Request a new Discourse account. Now the we application will make a request multisite_account.json as specified above.
Get the new Discourse account. The Discourse serverside code would do the following: (1) confirm that auth_key is a valid token, (2) make sure that email and (if given) username is not yet associated with an account, (3) make sure the user gave the required consents, (4) create a new account on the SSO provider site, (5) sync that new account to the Discourse Communities sites for which API keys are requested, (6) create and collect the requested API keys on the Communities sites, (8) send a reply with all necessary information to the API client, as specified above.
Show an account summary page. The web application would show a page saying that the content was successfully published and also providing information about the user’s new Discourse account (site URL, username, password, e-mail address). The user would be asked to make a photo or otherwise take note of this information but also be told that she can always reset the password after entering her e-mail address.
Error handling. In case that the account could not be created, for example because an account using that e-mail address already exists, the web application should show the user’s text input and ask the user to (1) provide a new e-mail address to post under or otherwise (2) to log in to Discourse and copy&paste it into a new topic there. The web application could also provide the option to log in to Discourse, to post the text under that identity (after getting a Discourse User API key for that).
Use the API key to post to Discourse. At this point, the API client application has a Discourse Admin API key of a new Discourse user and can use that to do actions on behalf of that user, via the normal Discourse API. It can then use it to create a new topic authored by that user.

Note that the new user starts as a TL0 user, while all users who sign up on edgeryders.eu manually and confirm their e-mail address start as TL1 users. This is a spam protection measure and may lead to some issues if the user tries to post many links or images.
Log in and forward to Discourse (later). This is for later, not for the use case as a survey system. At the end of this process, there would be a link to bring users directly to Discourse, to the topic they just created, allowing them to interact with other users as a normal Discourse user. The special part here would be that they end up on Discourse in logged-in state, without having to go through the communities.edgeryders.eu login site. To make that work, the web application would simply submit the username and password to communities.edgeryders.eu and get a login cookie in return (assuming proper CORS policy handling). Then it would forward the user to edgeryder.eu/login, which means that the user will end up there in logged-in state. We use that trick at the end of the communities.edgeryders.eu login process already. To bring the user to their own topic automatically, we could extend Discourse with a `edgeryder.eu/login?redirect=/t/…" mechanism.

Source. We provide the source code of this API endpoint under an open source licence.

3.6. Obtaining an API key by API

In order for external websites to interact with a user’s Discourse account (such as by posting chatlogs as Discourse topics, creating additional notifications etc.), that external website has to send properly authenticated API requests to Discourse. This endpoint makes that possible by providing the user’s Discourse admin API key after SSO authentication at our SSO provider site, communities.edgeryders.eu.

Endpoint. This API endpoint is accessible at:

https://communities.edgeryders.eu/multisite_account_api_key.json

Only a HTTPS endpoint is provided, no HTTP version.

Request type. GET (Will later be changed to POST once we figured out how to set up the right CORS policy in Discourse. GET is not ideal as it should not be used for requests that cause state changes.)

Parameters

Authentication cookie. To access this API endpoint, you have to first do a login to your web application using communities.edgeryders.eu as SSO provider (instructions). That will, at the same time, log you in to communities.edgeryders.eu itelf, which again authenticates you via a cookie to be able to access this API endpoint. This only works because we allow cross-domain use of the session cookie via the permit-api-cors plugin.

The Discourse authentication cookie, is the _t cookie and looks like this: _t:"d1afe7345cd1f6389a0d2ab7792569". For testing, you can obtain it from an active communities.edgeryders.eu login under “Storage → Cookies” in the browser’s web developer tools.
hostname The hostname of the Edgeryders Communities site for which you want to obtain the Admin API key.

Response. The JSON response informs about the user’s admin API key on the requested site. If you receive a “500 Server Error” response, it will be due to this open issue

Example usage. An example GET request would be:

https://communities.edgeryders.eu/multisite_account_api_key.json?hostname=edgeryders.eu

In addition, the session cookie has to be sent along this GET request. For testing and debugging purposes, using curl is a good way to create these requests. Together with the session cookie, the full request as a curl command would look like this:

curl 'https://communities.edgeryders.eu/multisite_account_api_key.json?hostname=edgeryders.eu' -H 'Cookie: _t=bbeb……4a86'
`

And the response would look like:

{
    "site":"edgeryders.eu",
    "key":"4c0b6…309da0"
}

Typical usage. Here is a full description of the process a web application would use to access this API endpoint and utilize its result:

Get your web application’s domain added to “CORS allowed origins” on communities.edgeryders.eu.
Initiate a SSO authentication in your web application, using communities.edgeryders.eu as the SSO provider.
During SSO login, the user has to enter their username and password on communities.edgeryders.eu. As a result, they get the authentication cookie served from that site. Their browser stores it for domain communities.edgeryders.eu, giving them an active session on communities.edgeryders.eu (in addition to the SSO session on other sites via SSO login).
From your web application, send a request to https://communities.edgeryders.eu/multisite_account_api_key.json.
Since there exists a cookie for communities.edgeryders.eu, the browser sends it together with the request automatically. That behavior is allowed by the CORS settings made initially.
That cookie authenticates your web application’s request to multisite_account_api_key.json and the API call should be successful.
The API call returns the user’s Admin API key for the requested Discourse forum(s) and can use that to access them from your JS application under the user’s account.

Source. We provide the source code of this API endpoint under an open source licence.

Ideas for future improvements (not implemented so far)

Future alternatives: User API. The Discourse User API is meant to be used for this scenario, but requires the user to grant access rights to client software in their Discourse account. Also, it would require serverless HTML+JS web applications to somehow store the User API key in permanent browser storage so that the granting of access rights does not have to be done every time. For the future, a good option would be to modify Discourse so that it auto-confirms the User API key requests of certain applications.

Future alternative: cookie authentication. The permit-api-cors plugin already allows to access the multisite_account_api_key.json API via the communities.edgeryders.eu session cookie obtained from the SSO login. By installing that plugin also on the various communities sites, it should be possible to do a login on these sites by API, and then to use these sites with the appropriate session cookie for authentication, just like the browser does. The drawback is the rather complicated login process.

4. API client registry

Our custom APIs frequently change due to improvements and refactorings. To avoid the complexity of legacy APIs or API endpoint versioning, our method of change management is this API client registry. When you register your client application here, you will be notified when an API endpoint it uses changes. Registration is optional, but you’ll have to track API changes here in the manual if you don’t register. To register, simply edit this wiki. API client means an instance of a software; one application could be run in several instances.

API endpoint	API client	Contact
codes.json	graphryder1api.edgeryders.eu	@matthias
	graphryder2api.edgeryders.eu	@hugi
annotations.json	graphryder1api.edgeryders.eu	@matthias
	graphryder2api.edgeryders.eu	@hugi
consent.json	graphryder1api.edgeryders.eu	@matthias
	graphryder2api.edgeryders.eu	@hugi
multisite_account.json	bio26testing.edgeryders.eu	@hugi, @gdpelican

hugi · September 23, 2019, 8:55pm

Thanks for your work @matthias and @daniel. I have a few questions.

If I understand this correctly, this could cause some complications.

Our proposed architecture was that a JavaScript front end application would post to the platform by talking to the API. However, if the front end application needs to pass an auth key to do so, that key can’t be stored in a client-side web app without leaving it exposed in the browser. Having a second micro backend service which keeps the key doesn’t sound like it will solve anything either since that could just as easily be abused as the first one. And if the key is kept on a publicly accessible server, that is not any different than just keeping the key in the code. Am I misunderstanding something?

In this request, there is a password parameter which is not in the documentation. Which password is this?

An admin API key? Since the suggested architecture is a client-side web app, that key would then be accessible by the end-user. Can that key be used to make changes on behalf of other users to, or just the user in question?

matthias · September 24, 2019, 12:10am

No, it’s fine, you can store it there. The sole purpose of auth_key is spam protection, not any strong authentication. (Should we name it antispam_token maybe?)

multisite_account.json is a public API, just as signup is a public function in Discourse. To protect against 98% of spam, we just don’t want the API to be completely unprotected, and we want a way to revoke access selectively. That’s why we have this shared secret mechanism. It will be more ephemeral in the future: a key would be used in one campaign and then replaced by another etc. … see a more detailed proposal. Just as it takes spammers some time to pick up a new e-mail address, it will take them some time to pick up a link with a new key. Won’t be perfect but I think, pretty good.

Not in terms of being public, no. But a key in a config file is simpler to modify on the server, as no code deployment is needed. That helps when the keys will be more ephemeral.

Just the user in question. Discourse has two types of API access: User API and Admin API. Bit of a misnomer. Anything that uses an API key in Discourse uses the “Admin API” type of access, and anything that uses interactive authentication of an API client (“connecting an app to my account” style) uses the User API. Admin API access does not give a user admin access (except the user actually is admin).

The password of the new account to create. Fixed this in the documentation now.

alberto · February 26, 2020, 2:17pm

Heads up @matthias: this no longer is true, and the documentation of the annotation format is now obsolete. You have added many new fields, some of which not entirely obvious. Why do we have both uri and post_id, for example? And what is the function of units, geometry, shape and container, that all seem to refer to some kind of 2D space when you annotate images? Do start and end refer to video segments?

alberto · July 25, 2023, 7:47am

Two questions for @matthias and @daniel in the wake of the new Projects functionality in OpenEthnographer. From what I understood, a project consists of annotations and codes. On the other hand, the actual primary data (the topics and the posts therein) are not assigned to a project.

The SSNA of an ethnographic corpus requires both the primary data (the topics and the authors of each post) and the secondary data (annotations and codes). So, if my reading is correct, we need to keep assigning the ethno-PROJECTNAME codes to individual topics, and retrieve the corpus from there. Next, we use the endpoints of the Projects functionality to retrieve the secondary data. My first question is: is the above correct?

My second question is about parent codes. Think generic categories like we used in POPREBEL; you would use a code like Catholic Church in the actual annotation, but then Catholic Church would have a coarse-grained parent like social actors, that would itself not be used to annotate the corpus. We would want the parent codes of children codes used in the project to be also included in the project, even if they themselves are not used in annotations. It seems that this is the case. Second question: is the above correct?

However from the OpenEthno interface I am unable to assign just any parent code to my codes. Before I do that, I need to move (is it move or copy?) that parent code to the project. Som here is a third question: how did you deal with legacy codes ancestry?

matthias · July 25, 2023, 10:17am

First question

This is not necessary or intended. Instead, topics and posts are transitively assigned to projects via their annotations. To gather the corpus data, you would first gather the topic_id and post_id values in the responses of the annotations.json API endpoint, and use the usual Discourse API to then retrieve the associated post contents.

Second question

That’s how things are right now, at least for new codes created in projects. Because any code created in a project is assigned to that project, even if not yet used for any annotation.

Third question

This is intended behavior. You can assign all codes (parent codes or not, with annotations or not) that are part of your current project, but no others. You can move or copy (both options are available) codes to / from other projects. Typically you would copy, as “move” will remove it from the other project and is only useful when reorganizing projects.

We created copies of the whole ancestry of a code in any project that used a code. Where “used” refers to the legacy system with Discourse tags: a corpus (now project) uses a code when the code appears in a topic tagged with the corresponding ethno-* Discourse tag.

So you should be able to assign all parent codes that are listed as part of a project’s code right now, including those with zero annotations. Just not parent codes from other projects, as projects are isolated from each other.

manuelpueyo · May 16, 2025, 7:29am

hello @matthias , thank you for this page. could you please take a look a the links here? they dont seem to work. i am trying to understand the difference between top level and sub level category with an example. thank you !

alberto · May 16, 2025, 9:06pm

Hi @manuelpueyo, jumping in as Matthias has his hands full with family matters.

The information you want is in the Discourse documentation, which is very clear and well made. Bookmark this:

https://docs.discourse.org

And specifically here: https://docs.discourse.org/#tag/Categories/operation/getCategory

In terms of the database a top-level category is simply a category where

"category": {
    [...],
    "has-children": true,
    [...],
    }

Conversely, a sub-category is a category where that same parameter is false .

I should probably add that some time ago I wrote a library of read-only functions to help me do data analysis on edgeryders: network-viz-for-ssna/code/python scripts/z_discourse_API_functions.py at master · edgeryders/network-viz-for-ssna · GitHub. Maybe you can use some of them, they are like “give me all the posts for a category”, or stuff like that.

In line 23 you will see this instruction:

import discourse_API_config as cng # your API key goes in this file to access non-public data

The discourse_API_config file contains your own parameters: your username, API key and working directory. An example of that file is also found on the same repository. You basically copy discourse_API_config.py and z_discourse_API_functions.py. in your Python directory; then edit the former with your parameters; then you can open a shell, type

import z_discourse_API_functions as api

and then call functions like this.

>>> api.fetch_public_topics_from_cat(the-reef)

matthias · May 16, 2025, 11:02pm

The difference between top level and sub level categories in API terms is just what @alberto says above.

If you were asking about the difference in conceptual terms: Discourse has a simple structure of forums or boards, here called categories. It is a tree with only two levels: top level categories (“main forums”) on the first level and sublevel categories (“subforums”) on the second level. You can see all top level and sublevel categories listed on the start page with the two-level hierarchy I just mentioned.

I edited the section by updating the example links, and the whole process of determining category IDs. It became much simpler in Discourse versions that were released (and installed on this forum) after we last updated this wiki. Now the category ID is part of the category’s URL.