Mercurial > hg > mlmmj
changeset 852:5471407e104d
Add wrapping modes to facilitate wrapping non-English texts.
- Add %wordwrap%, %charwrap% and %userwrap% line-breaking modes.
- \<space> now means a non-breakable space, not a break opportunity.
- Introduce \/ to mark a break opportunity.
- Introduce \= to inhibit a break.
author | Ben Schmidt |
---|---|
date | Wed, 29 Feb 2012 00:46:35 +1100 |
parents | 31ac95b2d625 |
children | 90637da7fe2c |
files | ChangeLog README.listtexts src/prepstdreply.c |
diffstat | 3 files changed, 165 insertions(+), 48 deletions(-) [+] |
line wrap: on
line diff
--- a/ChangeLog Wed Feb 29 00:26:35 2012 +1100 +++ b/ChangeLog Wed Feb 29 00:46:35 2012 +1100 @@ -1,3 +1,4 @@ + o Add different wrapping modes to facilitate wrapping many languages o Fix backslash escaping mechanism so double backslash can't effectively recurse and form part of another escape sequence, other non-unicode escapes aren't ignored, and first lines of included files don't 'escape' escaping. @@ -9,10 +10,9 @@ o Make mlmmj-sub and +subscribe[-digest|-nomail] switch existing subscriptions. o Add a switch to bypass notifying the owner on subscribe/unsubscribe. - o Introduce \<space> to indicate line-break positions to enable sensible - wrapping of Chinese and similar text. - o Allow lines to be longer than the wrapping width if there are no spaces, - as generated email addresses (e.g. for moderation) won't work if split. + o Introduce \<space> to indicate non-breakable space, \= to mark other + locations where breaks should not occur, and \/ to mark locations where + breaks can occur o Add rejection of posts and obstruction of subscriptions. o Avoid bogus error messages when logging that the list address has been found in To: or CC: headers.
--- a/README.listtexts Wed Feb 29 00:26:35 2012 +1100 +++ b/README.listtexts Wed Feb 29 00:46:35 2012 +1100 @@ -15,8 +15,11 @@ - Supported list texts - Format - Conditionals -- Formatting and formatted substitutions +- Wrapping +- Formatting and comments +- Formatted substitutions - Unformatted substitutions +- Escapes Naming scheme ------------- @@ -240,21 +243,75 @@ Note that when multiple parameters can be given for the directives, these have 'or' behaviour; to get 'and' behaviour, nest conditionals. -Formatting and formatted substitutions --------------------------------------- +Wrapping +-------- -These formatting-related directives work with multiple lines, so are generally -not appropriate for use in headers. They are: +There are various directives available to assist with wrapping and formatting. +Wrapping needs to be enabled for each paragraph with: - %wrap% - %wrap W% - lines until the next blank line are concatenated and are then rewrapped to a - width of W (or 76 if W is omitted); lines have whitespace trimmed before - being joined with a single space; lines are broken at spaces or at points - marked for breaking with \<space>; the width is reckoned including any text + concatenate and rewrap lines until the next blank line to a width of W (or 76 + if W is omitted); second and later lines are preceded with as many spaces as + the width preceding the directive; the width is reckoned including any text preceding the directive and any indentation preserved from a file which - included the current one, so it is an absolute maximum width; it is measured - in bytes + included the current one, so it is an absolute maximum width + +To cater for various languages, there are a number of different wrapping modes +that can be set. These can be set either before or after wrapping is specified, +and can even be changed part way through a paragraph if desired. The following +directives control them: + +- %wordwrap% +- %ww% + use word-wrapping (this is the default; good for English, French, Greek and + other languages that use an alphabet and spaces between words); lines have + whitespace trimmed from both ends and are joined with a single space; lines + are broken at spaces or at points marked for breaking with \/, but not at + spaces escaped with a backslash + +- %charwrap% +- %cw% + use character-wrapping (good for Chinese, Japanese and Korean which use + characters without spaces between words); lines have only leading whitespace + trimmed and are joined without inserting anything at the joint; lines are + broken at space or any non-ASCII character except where disallowed with \= + +- %userwrap% +- %uw% + use user-wrapping (for more complex languages or wherever complete manual + control is desired); lines have only leading whitespace trimmed and are + joined without inserting anything at the joint; lines are broken only where + marked for breaking with \/ + +If a line with any of the directives in this section, after processing, +contains only whitespace, the line does not appear at all in the output (the +newline and any whitespace is omitted). + +Formatting and comments +----------------------- + +The following directives are available to assist with formatting and +readability: + +- %^% + start the line here; anything preceding this directive is ignored (useful for + using indentation for readability without ruining the formatting of the text + when it is processed) + +- %comment% +- %$% + end the line here; anything following this directive is ignored + +If a line with any of these directives, after processing, contains only +whitespace, the line does not appear at all in the output (the newline and any +whitespace is omitted). + +Formatted substitutions +----------------------- + +These formatted substitutions work with multiple lines, so are generally not +appropriate for use in headers. They are: - %text T% text from the file named T in the listdir/text directory; the name may only @@ -303,27 +360,12 @@ the list of indexes of messages which may not have been received as they bounced -- %^% - start the line here; anything preceding this directive is ignored (useful for - using indentation for readability without ruining the formatting of the text - when it is processed) - -- %comment% -- %$% - end the line here; anything following this directive is ignored - -- %% - a single % - Directives which include a list of items have the behaviour that each item is preceded and followed by the same text as preceded and followed the directive -on its line. Only one such directive is supported per line. - -The %wrap% and %wrap W% directives, as well as those which include a block of -text, have the behaviour that second and later lines are preceded with as many -spaces as there were characters preceding the directive. Apart from the -%wrap% and %wrap W% directives, any text following the directive on the same -line is omitted. +on its line; only one such directive is supported per line. Those which include +a block of text have the behaviour that second and later lines are preceded +with as many spaces as there were bytes preceding the directive; any text +following such directives on the same line is omitted. If a line with any of these directives, after processing, contains only whitespace, the line does not appear at all in the output (the newline and any @@ -332,6 +374,8 @@ Unformatted substitutions ------------------------- +Unformatted substitutions that are available are: + - $bouncenumbers$ (available only in probe) the formatted list of indexes of messages which may not have been received as @@ -494,18 +538,35 @@ newline stripped; the name may only include letters, digits, underscore, dot and hyphen; note that there is a formatted version of this directive +Escapes +------- + +These allow you to avoid special meanings of characters used for other purposes +in list texts, as well as control the construction of the texts at a fairly low +level. + - $$ a single $ +- %% + a single % + +- \\ + a single \ + - \uNNNN - (NNNN are hex digits) + (NNNN represents four hex digits) a Unicode character (this is not really appropriate for use in a header, except perhaps the Subject: header as Mlmmj does automatic quoting for that header as described above) - \<space> + a space, but don't allow the line to be broken here when wrapping + +- \/ nothing, but allow the line to be broken here when wrapping -- \\ - a single \ +- \= + nothing, but don't allow the line to be broken here when wrapping +
--- a/src/prepstdreply.c Wed Feb 29 00:26:35 2012 +1100 +++ b/src/prepstdreply.c Wed Feb 29 00:46:35 2012 +1100 @@ -98,6 +98,13 @@ }; +enum wrap_mode { + WRAP_WORD, + WRAP_CHAR, + WRAP_USER +}; + + struct text { char *action; char *reason; @@ -108,6 +115,7 @@ formatted *fmts; int wrapindent; int wrapwidth; + enum wrap_mode wrapmode; conditional *cond; conditional *skip; }; @@ -458,6 +466,7 @@ txt->fmts = NULL; txt->wrapindent = 0; txt->wrapwidth = 0; + txt->wrapmode = WRAP_WORD; txt->cond = NULL; txt->skip = NULL; @@ -916,6 +925,20 @@ *line_p = line; return 0; } + } else if(strcmp(token, "ww") == 0 || + strcmp(token, "wordwrap") == 0 || + strcmp(token, "cw") == 0 || + strcmp(token, "charwrap") == 0 || + strcmp(token, "uw") == 0 || + strcmp(token, "userwrap") == 0) { + if (*token == 'w') txt->wrapmode = WRAP_WORD; + if (*token == 'c') txt->wrapmode = WRAP_CHAR; + if (*token == 'u') txt->wrapmode = WRAP_USER; + line = concatstr(2, line, endpos + 1); + *pos_p = line + (*pos_p - *line_p); + myfree(*line_p); + *line_p = line; + return 0; } else if(strncmp(token, "control ", 8) == 0) { token = filename_token(token + 8); if (token != NULL) { @@ -990,8 +1013,8 @@ char *tmp; char *prev = NULL; int len, i; - int directive; int incision, spc; + int directive, inhibitbreak; int peeking = 0; /* for a failed conditional without an else */ int skipwhite; /* skip whitespace after a conditional directive */ int swallow; @@ -1047,8 +1070,11 @@ /* Wrapping */ len = strlen(prev); pos = prev + len - 1; - while (pos > prev && (*pos == ' ' || *pos == '\t')) + if (txt->wrapmode == WRAP_WORD) { + while (pos > prev && + (*pos == ' ' || *pos == '\t')) pos--; + } pos++; *pos = '\0'; len = pos - prev; @@ -1071,8 +1097,12 @@ if (*prev == '\0') { tmp = mystrdup(pos); } else { + if (txt->wrapmode == WRAP_WORD) { tmp = concatstr(3, prev, " ", pos); len++; + } else { + tmp = concatstr(2, prev, pos); + } } myfree(line); line = tmp; @@ -1096,9 +1126,13 @@ incision = -1; } directive = 0; + inhibitbreak = 0; while (*pos != '\0') { if (txt->wrapwidth != 0 && len >= txt->wrapwidth && !peeking && spc != -1) break; + if ((unsigned char)*pos > 0xbf && txt->skip == NULL && + txt->wrapmode == WRAP_CHAR && + !inhibitbreak) spc = len - 1; if (*pos == '\r') { *pos = '\0'; pos++; @@ -1113,23 +1147,35 @@ txt->src->upcoming = mystrdup(pos); break; } else if (*pos == ' ') { - if (txt->skip == NULL) { - spc = pos - line; - } + if (txt->skip == NULL && + txt->wrapmode != WRAP_USER && + !inhibitbreak) spc = len; + inhibitbreak = 0; } else if (*pos == '\t') { /* Avoid breaking due to peeking */ + inhibitbreak = 0; } else if (txt->src->transparent) { /* Do nothing if the file is to be included * transparently */ if (peeking && txt->skip == NULL) break; + inhibitbreak = 0; } else if (*pos == '\\' && txt->skip == NULL) { if (peeking) break; - if (*(pos + 1) == ' ') { + if (*(pos + 1) == '/') { spc = len - 1; tmp = pos + 2; + inhibitbreak = 0; + } else if (*(pos + 1) == '=') { + tmp = pos + 2; + /* Ensure we don't wrap the next + * character */ + inhibitbreak = 1; } else { - /* Includes backslash */ + /* Includes space and backslash */ tmp = pos + 1; + /* Ensure we don't wrap a space */ + if (*(pos+1) == ' ') inhibitbreak = 1; + else inhibitbreak = 0; } *pos = '\0'; tmp = concatstr(2, line, tmp); @@ -1143,6 +1189,10 @@ substitute_one(&line, &pos, listaddr, listdelim, listdir, txt); if (len != pos - line) { + /* Cancel any break inhibition if the + * length changed (which will be + * because of $$) */ + inhibitbreak = 0; len = pos - line; } skipwhite = 0; @@ -1175,6 +1225,11 @@ } } if (len != pos - line) { + /* Cancel any break inhibition if the + * length changed (which will be + * because of %% or %^% or an empty + * list) */ + inhibitbreak = 0; len = pos - line; } /* handle_directive() sets up for the next @@ -1217,7 +1272,8 @@ continue; } if (spc != -1) { - if (line[spc] == ' ') line[spc] = '\0'; + if (txt->wrapmode == WRAP_WORD && + line[spc] == ' ') line[spc] = '\0'; spc++; if (line[spc] == '\0') spc = -1; }