Writing in Arabic and other right-to-left scripts (RTL)

This topic is a linked part of a larger work: “Discourse User Manual for edgeryders.eu

Content

1. How to write a post in Arabic (and other right-to-left scripts)?

2. How to write mixed LTR / RTL content (say, Latin and Arabic scripts)?

3. How to use Markdown formatting in Arabic (or other RTL) text?


1. How to write a post in Arabic (and other right-to-left scripts)?

This exclusively shows how to write a whole post in right-to-left script. There will be another question dealing with mixed text direction input, which is quite confusing.

  1. Open the Discourse editor to create or edit a topic or comment.

  2. For input: Configure the writing direction for the editor textbox. You have two options:

    • If you write right-to-left by default whenever you use a browser, you will already have configured right-to-left as default writing direction for embedding, usually by choosing your browser language.
    • If you want to set the direction only for this one time, right-click into the editor’s left textbox and choose “Writing Direction → Right to Left” (tested in Chrome, will be similar in other browsers).
  3. For output: Use the text direction tag. The last step will only show proper right-to-left text in the editor, not in the preview on its right side or the final rendering after saving. For that, you have again two options:

    • Before you start writing, click the “right-aligned lines” in the editor’s toolbar to create a [text-direction=rtl] … [/text-direction] tag, and write your content inside. (To apply this tag, you could also select all your text when you’re done writing and then use this editor button, but it would needlessly create one such tag around each paragraph.) When using this technique, you currently have to work around two issues (until we can resolve them):
      • Add a blank line before the start of the [text-direction=rtl] tag, or it will not have any effect for right-aligning your content, in case there is a Markdown list item right before it. (issue #92)
      • Add a blank line right after the opening [text-direction=rtl] tag, or no Markdown formatting would work. (issue #91)
    • You can put the character sequence &#x202b; at the beginning of a paragraph to place the Unicode “Right-to-Left Embedding” character there. It will control the direction of this paragraph, until it is automatically reset at the end of it (<br/> or an empty line). While this solution also works, it prevents you from using Markdown formatting for lists, as that needs the line to start with a number or asterisk character. You could still use HTML tags for list though. But better stick with the first option – we document this second one only for completeness …

2. How to write mixed LTR / RTL content (say, Latin and Arabic scripts)?

Here are three concepts that help to understand what is “text direction”. (@matthias made up these concepts, there are probably better / correct official names for them.)

  1. Character order is the order how you consume characters when reading them in the right sequence. Character order is exactly the same as how characters are internally stored by the computer, namely, as a Unicode string. Only when writing them to the screen and reading them, there can be jumps due to changes of text direction within the string.

  2. Character writing direction determines if, after entering one character, the caret that marks where the next character will be inserted will be to the right or to the left of the inserted character. This depends only on the script (LTR or RTL) of the character you just entered, and can not be configured. (There are some edge cases, for example characters like “/”, “<” etc. that are agnostic about writing direction.) This character writing direction is also utilized by the computer when writing characters to the screen, one by one. So it can never happen that a LTR word is displayed RTL, for example “house” as “esuoh”.

  3. Paragraph writing direction or (in official Unicode terms) “embedding” direction determines how, in a paragraph of mixed RTL and LTR sequences, the sequences are written into a paragraph. This is a display property of the paragraph, and can be changed around at will. It can even be different for input (editor box) and output (browser text), and normally will be chosen based on the main language of the whole text, or in some cases, of the individual paragraph. Here’s an illustration of the two display styles, which can be switched by changing the writing direction of a textbox (for input, in the context menu) or HTML element (for output, using a tag / attribute):

     as LTR paragraph: | LTR1 RTL1 LTR2 RTL2            |
     as RTL paragraph: |            RTL2 LTR2 RTL1 LTR1 |
    
  4. Paragraph alignment simply means if lines are aligned to the left or right edge of the page. It is a separate concept, but configured automatically by choosing your paragraph writing direction. (This works both for input and output, independently.)

Now to write mixed LTR / RTL content, the rules are simple to understand:

  • Configure paragraph writing direction for input depending on your preferred writing direction, or if you like that better, on the main script of each paragraph. See “How to write a post in Arabic” above for how to configure this. If different paragraphs use different main scripts, you may have to switch writing directions when moving to edit a new paragraph. You can do that simply from the right-click menu in most browsers, as often as you like.

  • Configure paragraph writing direction for output depending on the main script in your whole text. See “How to write a post in Arabic” above for how to configure this.

But it will be confusing sometimes, so you will need some special tricks:

  • Use another editor for difficult cases of mixed LTR / RTL strings. When one string has both left-to-right (“Latin”) and right-to-left (“Arabic”) characters, esp. with frequent switches, the browser default implementation for editing that is quite a nightmare (confusing caret jumps when selecting, entering characters etc.). Solution: Copy&paste the full title to this online editor, edit there, copy&paste it back.

  • Prefer Markdown over HTML. The Discourse editor accepts both, but Markdown seems preferable as the special characters it needs do not introduce “change of text direction”, unlike HTML tag names due to their Latin characters. So, less mixed LTR / RTL mix, and less confusion when selecting text etc…

  • Select first when deleting content. When editing mixed LTR / RTL content and not being very used to it, it will be confusing that the “Backspace” and “Delete” keys change the direction they operate in based on the direction of text you are in. So you will often hit the wrong one. To prevent this, select text to delete first, then press “Delete” to remove the whole selection, and only that.

3. How to use Markdown formatting in Arabic (or other RTL) text?

You can use character-based formatting (bold, italic, links etc.) as normal. The special characters used for that do not introduce a change of text direction, so no confusing jumps will happen when selecting such text. (Exception: Latin characters for link URLs.)

However, when you want to use paragraph-based formatting (lists, blockquote, headers), Markdown needs two things:

  • The special formatting characters (like “*” for a list item) have to be at the start of the line. So, you can’t use the Unicode right-to-left embedding character &#x202b; to create a RTL paragraph writing direction in the output. Instead use the [text-direction=rtl] … [/text-direction] tag from the “right-aligned lines” editor button.

  • Lists and pre-formatted paragraphs (“those indented with four spaces”) need an empty line above the first list item. Together with the last point, this means the text you enter into your editor to get a two-item list should look exactly like this (assuming RTL writing direction in your editor):

[text-direction=rtl]

[text-direction=rtl]

* تحدثنا مروة حسن
* تحدثنا مروة حسن
[/text-direction]

[/text-direction]

1 Like