Broken chronological order in some old posts

I was looking through some old posts, and noticed that some of them display in a non-chronological order. Example:

It starts off in early December 2011 and initially develops as normal, but then Nadia’s comment shows as “1 month later” (4 Jan 2012), but immediately after it jumps back to 4 December 2011. Towards the end, it shows a “22 days later” and some more posts from late January 2012. In fairness, that was Drupal 6 stuff ported to Drupal 7 ported to Discourse. Still, I am curious… and we should be careful when processing data on a large scale.

Apparently breaking the chronological order is not considered a problem, as that was a necessary side effect of migrating threaded Drupal nodes to semi-flat Discourse topics. We tried to order all comments by date in Discourse at first, but that became unreadable as the reader has to jump back and forth a lot. This is not the case in threaded topics where a flow of reading is preserved, but also not in topics naturally created in Discourse, as people care to establish context because they are aware that they are posting at the very bottom.

So when importing, we ordered comments in the visual order they appeared in Drupal, but without the visual indentations for threading. Preserving the “what is a reply to what” relationships, of course.

That last step is however what did not work in the topic you found. For whatever reason, the “what comment answers to what comment” relations are completely messed up when comparing to the original node in Drupal 7.

I looked into this issue for a bit, and it seems to affect only the earliest posts from the Drupal 6 platform. The effect is most often a completely flat thread in Discourse (all reply-to information omitted), and sometimes in addition parts of the thread are messed up thread (wrong reply-to information given). Looks like an offset issue (reply-to information from the wrong topic is used).

Initial findings re. the amount of topics affected:

  • :x: node 817 / topic 1245: affected (the one you found)
  • :x: node 818 / topic 1246: affected
  • :x: node 820 / topic 1248: affected
  • :x: node 1111 / topic 1529: affected
  • :x: node 1150 / topic 1750: affected
  • :x: node 1245 / topic 1750: affected
  • :heavy_check_mark: node 1246 / topic 1651: not affected
  • :x: node 1249 / topic 1654: affected
  • :x: node 1300 / topic 1704: affected
  • :heavy_check_mark: node 1340 / topic 1740: not affected
    (the post with the highest Drupal 6 legacy_nid, namely 1330, means the last automatically imported Drupal 6 node; if nothing is affected after this point, the issue originated from the Drupal 6 → Drupal 7 import; some other content such as node 1341 that was imported manually and will not be affected then)
  • :heavy_check_mark: node 1346 / topic 1764: not affected
    (this “not affected” one indicates that the import from Drupal 6 is probably indeed the fault)
  • :heavy_check_mark: node 1400 / topic 1805: not affected
  • :heavy_check_mark: node 1500 / topic 1884: not affected
  • :heavy_check_mark: later nodes: probably not affected

Assuming the problem is indeed with the old Drupal 6 → Drupal 7 import, then to fix this we have to go through all Drupal 7 nodes with nid < 1340, check if the problem is happening, and fix it by deriving and adding the proper reply-to information.

All content has been imported and its relative order shown in Discourse seems fine, too, however the “who replied to whom” information may be wrong. The problem does not appear in all Drupal 7 nodes with nid < 1340 as we started using our Drupal 7 platform before all the Drupal 6 content was imported into it. A better condition will be “nid < 1340 AND comment creation date before the import date”.

We’ll have to look into this in more detail later. For now, I only added a task for platform cleanup to Dynalist, and @anu may get her chance to learn Ruby programming while fixing this mess. :wink:

1 Like