Pre-generating the JSON.
Glad you found a way that works for now. For later (when you want the content as well, and the usernames, so requiring restricted access), I propose the following: I’ll set up a local script on the server that download the JSON view output into files (probably using drush, which should work even while making the view admin-only on the web). This will use the PHP-CLI interface, for which I can set much longer runtimes safely. And then we access-protect these periodically generated files with HTTP Basic Auth. Or something. Additional benefit is low server load as it only needs to generate this stuff once a day (or once an hour if you want).
Works? If so, tell when when you want me to work on that. Else I’m proceeding with other ER related stuff for now
User names?
And it works! WOW!
Hmm, I think there is a misunderstanding here. Do we really see usernames as sensitive data? The visualization as it is (only user IDs) lends itself to some math consideration, but is unwieldy as a management tool. In fact, Nadia (see her comment above) asked for usernames to be live links to their respective profiles.
I think we should lock the JSONs, but not what the EdgeSense dashboard does with it. @Matthias, can we agree? Bring back user names and add links?
The other problem is that the JSONs seem to have lost the memory of merged accounts. As a result, it looks like all of the content in 2011 and 2012 was provided by the community. Matt will probably have some clue why this is happening.
@Luca, what have you called the views? I would like to check one thing.
Not dumping usernames in the open.
I guess there’s a misunderstanding about what the misunderstanding is: I am ok with Edgeryders (and their associated researches) including usernames in their research work, and making a tool that includes usernames in its analytics accessible to the public. But, I don’t want the data source we use for that dumped into the public as a non-anonymized research data package. We encourage people to use their real names as usernames, and many do, so this is indeed sensitive if combined with a computer-understandable version of what they say (RDFa is getting near it …), in a big old data dump… That’s why I said, if you include usernames etc., make the views access restricted. It’s admittedly a matter of principle at this early stage of this fast-growing platform but, remember our discussion about semantic data privacy we had near the end of the Edgeryders 1 site.
Re. early content appearing as being written by the community, here’s a rough guess about what happened: when migrating the Drupal 6 content over, user IDs changed. So in case you stored IDs to identify which content was created by staff, you gotta update these to the new ones. (There’s still a legacy_id
field containing the old one, but I guess it’s simpler to find out by searching for users.)
Restricting access
@Matthias I agree on locking the data source (the json views), I’m just not sure how to do it and still be able to access them from outside (and not from a browser). Two ideas:
- you lock down the /json_* URL at the wbe server level with http basic auth (we give it a user / password)
- since I'm pull them from a fixed IP you can restrict access to the views by source IP
We coud even do both
On the issue of the data … The script doesn’t take into account the legacy_id and it considers part of the team the persons that have a role defined (usually that is admin or something like that). I could in principle add a configuration to the script to make sure some users are considered part of the team no matter what their role is currently, to le me do so you’d need to give me the list of their current uids.
It’s not there still but ideally the tool would cache the “team status” of each person from the previous runs, and it would know when someone had become part of the team and when it was not part of the team anymore, but currently the functionality is not there and it’s not possible to find out from a single data dump alone wether someone has been on the team in the past or not
Ok – means exposing the view via a REST API.
HTTP Basic Auth seems fine to me (of course you’ll have to remember to always use HTTPS for it …). I’ll just want to implement it in a generic way so that we can re-use the solution for any data syncing purposes lateron, not requiring changes to .htaccess every single time. I looked around for Drupal modules, and it seems that a REST API that exposes your views is the only meaningful way to get this done (services + services_views + services_basic_auth).
I’ll care to get this configured today. Sorry for the delay … I was configuring something similar with securesite, until I discovered that it would add a HTTP Basic Auth option to every single page on this website (and the option to restrict this to just some pages was only implemented for Drupal 5).
Re. who is considered part of the team: in our current setup, having an added role says nothing anymore about team status. We even have well-regarded admins like @Auli who have not been on an Edgeryders payroll yet. So it seems cleanest to me to just drop detecting team membership from roles completely and let your script read a simple config file that will list team members by UID (and if you want, adding multiple date ranges where the user was part of the team). I guess it does not make sense to keep that information on the platform (since team status only relates to one project such as Spot the Future, which officially is just one of many projects with equal rights on this site; plus, legacy_id
is up for deletion). So it will be the cleanest solution to maintain this config file with UIDs. For getting the UIDs, @Alberto is the right person for the job – I just don’t know everyone to include.
I’d leave it to @Alberto but if i remember correctly what was needed as the meaning of “being on the team” wasn’t necessarily being on the payroll of Edgeryders, but rather having some “community management” role within the community.
I’d envision the script to have a primary way to determine the team membership as some property coming through the data export (that could be the roles or something else), but I like your proposal of a secondary / out-of-band method for the cases where that isn’t feasible: using an attribute from the site would be less complicated to setup and maintain for the average community manager while having a way to hard code the team in a configuration could help with more exotic cases.
Agree with Luca
Yes, Luca, you are correct. Edgeryders is a (hopefully special) case of a website with a long history spanning across two Drupal distributions (one based on Drupal 6, another one on Drupal 7). So, not only do we have a team that shifts, but we also have the same people that changed their user IDs across time as the old data were migrated onto the new platform. I was user 4 on the old Drupal 6, and am user 34 on the present website. I would definitely keep the idea of exporting the roles as they are on the Drupal DB, and then leaving it to the community managers to interpret the data. In the case in which dropping administrators does not make sense, you would simply not uncheck the box. A more sophisticated approach consists of printing one box per user role (there are two roles in Edgeryders, besides registered user) and letting the analyst include or exclude them.
An alternative way of doing the same would be to let the user specify a list of people to count as “team”.
But here is a question for @Matthias. My old account was merged with the new one. As a result, old posts and comments (of which there are a lot) are attributed to my current user, which is 34. User 34 is an administrator in Edgeryders. So why does it look like current administrators did nothing in 2011-2012? Are old posts still attributed to the legacy ID instead of the ID the legacy users were merged with?
Re:
My concern with roles is that this is just not accurate data – it matches often, not always, since roles were not made for this. Adding a hidden field “Community manager: yes / no” to user profiles could work, but can’t capture date ranges. Anyway, if using the roles mechanism is accurate enough for you, I’m ok with it. Just please don’t show the original role names in the tool, or admin accounts might attract some black-hat hackers … rather using “facilitator” or something.
Regarding old content not appearing: This is not supposed to happen. All old content should appear under the new (migrated) UIDs. I’ll have to investigate this … . Where can I see this issue for myself? The demo uses Drupal 6 data it seems, not the stuff as migrated to Drupal 7.
Why D6 team activity doesn’t show (probably)
It seems that nodes, comments and users from D6 are all contained correctly in our database and the JSON exports. Here’s what I think happens: Edgesense has probably a sanity condition built in, filtering out posts that are supposed to come from a user before that user’s account creation date. Edgeryders content is everything but sane though. All account creation dates for admins (and some other early adopters of our Drupal 7 site) are those of opening their shiny new Drupal 7 account around January 2013. Their Drupal 6 content got merged into that account lateron by me, not touching their account creation date though. For everyone where merges were not needed, their Drupal 6 accounts were simply ported, including their original account creation date.
Luca, can you say if this assumption is correct? If so, maybe just disable your sanity checks for now. More applicable when dealing with Edgeryders anyway (Later, I’ll build a little script to correct account creation dates. Have it on my list now …)
Restricting access: done
Ok, I got the solution with services + services_views + services_basic_auth set up successfully and changed your views to be accessible by admin users only now. As a side effect, the URLs of your views that allow HTTP Basic authentication are these (the standard URLs only allow session authentication):
- https://edgeryders.eu/api/views/edgesense_users.json
- https://edgeryders.eu/api/views/edgesense_nodes.json
- https://edgeryders.eu/api/views/edgesense_comments.json
So in your script, use the HTTP Basic’s “Authorization:” header to access these views, together with the username and password of a Drupal admin user (and because it’s about admin user credentials, make sure to always use HTTPS or the password would travel in plain text …). I tested the above URLs successfully with my account credentials and a little PHP script taken from StackOverflow.
This setup’s nice side effect for the future is that we can now define a complete edgeryders.eu API by just filling in some forms in Structure -> Services. In this case, I defined a REST API (only with the Retrieve operation of course) that can serve all existing views as JSON and XML. Just use a URL like above with the view name and either .json or .xml. Some parameters for paging etc. can be added [documentation, at “Executing view via views resource”].
Update: Adapted URLs because of renaming the views.
1 Like
@Alberto the views are all called json_* (e.g. json_users)
Rename views please
Can you call them EdgeSense instead?
edgesense_json_*
We are using the JSON export functionality for things other than EdgeSense.
While we’re at this
While we’re into renaming things, can you call them edgesense_users
etc. please? Data format should rather be a matter of naming displays, if at all. (For JSON, you don’t even have to create specific JSON output in views anymore, since every view can be rendered into JSON using services_views and the URL format I explained above.)
Renaming done.
Just did the renaming myself, the views now have machine names and paths edgesense_users
, edgesense_nodes
, edgesense_comments
. Sorry for breaking your API, Luca
no problem! I’ll modify the script to use the user/password and change the URLs
Restricting the views to public content.
I noticed that the views so far also export information about non-public nodes (that is, those from private Organic Groups). And I propose to not include these – they’re not intended as part of our research data package, but contain internal, not to be published coordination (mostly team members interacting, anyway).
If there are no objections – Luca, could you add a filter to restrict the views to content that is accessible to the anonymous user.
No objections from me, @Alberto ?
Data inconsistencies + problems with services_views
@Matthias I think I have identified the source of the problems with the data which make the visualization misbehave:
there are many users (all of those that were imported?) which have a creation date which is in the future wrt some of the content they have created. e.g. @Alberto has a create date (timestamp) of: 1359233500 which is in Jan 2013 BUT there are posts older than that from him. The dashboard (specially the time-slider) uses the create date of the user to know if the node exists, but it ends in a incosistent state if the the nodes don’t exist while some edge should at a given time…
Moreover: the services_views kind of work, but they don’t seem to respect what is set for the fields, e.g. the roles field is always empty when using the api, so the script cannot know if someone is from the team. Notice that the same view if opened directly from the browser has all the content there. I’ve tried various things this morning but wasn’t able to make this part work: opening this after the login shows the correct roles field: https://edgeryders.eu/edgesense_users while downloading this: https://edgeryders.eu/api/views/edgesense_users.json doesn’t
Last, but not least I wasn’t able to add a link to the user’s page in the view and cannot (n.b. i just need the URL to the users profile page NOT a complete HTML link tag as i need to reuse it from the dashboard) …
Re:
Re. the data inconsistency, that’s exactly what I supposed the reason to be; for details see this comment from yesterday. If you want / have to keep the Edgesense part unchanged, maybe you can filter out the users affected by this for me, and give me a list of them together with the account creation date they should get (at or a bit before their first post / comment). I can then fix that in the database, but don’t have time right now to create that list myself.
For creating a link to the user profile, I have encountered that problem earlier, too. I have adapted the view to generate just a Drupal’s canonical URL from the user ID. For me for example, it’s https://edgeryders.eu/user/36, pointing to the same as https://edgeryders.eu/users/matthias.
About the problem with output of roles, I’m looking into it.