Discussion:
Bash vi mode's e command (end of word) goes to eol when hitting a unicode character
Enrico Maria De Angelis
2018-09-03 11:13:03 UTC
Permalink
This is kind of a pedantic bug report.
Basically it seems that bash's vi-mode doesn't use the same definition of
words/Words/... that Vim uses (which is the facto the always installed
version of vi), but I write you the same, just in case it's an easy task to
do the fix (if you think this is really a bug).

The version number of bash: GNU bash, version 4.4.23(1)-release
The hardware and operating system: Arch LInux (constatly update)
The compiler used to compile: I didn't compile bash myself
A description of the bug behaviour: & A short script or `recipe' which
exercises the bug:
While vi-editing a line like the following
$ ls bulk32³ grids.dat COPYING
with the cursor in normal mode at the beginning of the line, hitting e
repeatedly, cause the cursor to move in order to
s of ls (correct)
2 of bulk32³ (correct, since Vim itself works like this, with an end of
word being detected in between 2 and ³)
end of line (wrong)

Kind regards,
Enrico Maria De Angelis
Greg Wooledge
2018-09-04 13:28:33 UTC
Permalink
Post by Enrico Maria De Angelis
The version number of bash: GNU bash, version 4.4.23(1)-release
The hardware and operating system: Arch LInux (constatly update)
The compiler used to compile: I didn't compile bash myself
A description of the bug behaviour: & A short script or `recipe' which
While vi-editing a line like the following
$ ls bulk32³ grids.dat COPYING
with the cursor in normal mode at the beginning of the line, hitting e
repeatedly, cause the cursor to move in order to
s of ls (correct)
2 of bulk32³ (correct, since Vim itself works like this, with an end of
word being detected in between 2 and ³)
end of line (wrong)
I can confirm this in Debian's bash 4.4.12 and in bash 5.0-alpha. It's
actually worse than Enrico reports.

First, the cursor doesn't actually move to the end-of-line character
('G'). The cursor moves one space *past* that.

Once there, pressing either 'h' or 'b' moves the cursor from end-of-line
back to the ³ character. That's fairly odd on its, own, but it gets
even more interesting.

If you go back to beginning-of-line, then press 'e' 3 times (so the cursor
is beyond the 'G'), then press 'i' ' ' to insert a space character, the
multi-byte character gets broken up. What I see is this:

wooledg:~$ ls bulk32� � grids.dat COPYING

So, it seems the space was inserted in the middle of the byte sequence
that constituted the ³ character (0xc2 0xb3) originally, resulting in
two invalid-character bytes with a space in the middle.

This is in LANG=en_US.UTF-8 on Debian 9 amd64.
Enrico Maria De Angelis
2018-09-04 17:54:15 UTC
Permalink
Ow,
I'm sorry for not having investigated further, since I thought it was kind
of expected.
Thank you for doing it, Greg.
Hope this will be fixed.
Kind regards,
Enrico Maria
Post by Greg Wooledge
Post by Enrico Maria De Angelis
The version number of bash: GNU bash, version 4.4.23(1)-release
The hardware and operating system: Arch LInux (constatly update)
The compiler used to compile: I didn't compile bash myself
A description of the bug behaviour: & A short script or `recipe' which
While vi-editing a line like the following
$ ls bulk32³ grids.dat COPYING
with the cursor in normal mode at the beginning of the line, hitting e
repeatedly, cause the cursor to move in order to
s of ls (correct)
2 of bulk32³ (correct, since Vim itself works like this, with an end of
word being detected in between 2 and ³)
end of line (wrong)
I can confirm this in Debian's bash 4.4.12 and in bash 5.0-alpha. It's
actually worse than Enrico reports.
First, the cursor doesn't actually move to the end-of-line character
('G'). The cursor moves one space *past* that.
Once there, pressing either 'h' or 'b' moves the cursor from end-of-line
back to the ³ character. That's fairly odd on its, own, but it gets
even more interesting.
If you go back to beginning-of-line, then press 'e' 3 times (so the cursor
is beyond the 'G'), then press 'i' ' ' to insert a space character, the
wooledg:~$ ls bulk32� � grids.dat COPYING
So, it seems the space was inserted in the middle of the byte sequence
that constituted the ³ character (0xc2 0xb3) originally, resulting in
two invalid-character bytes with a space in the middle.
This is in LANG=en_US.UTF-8 on Debian 9 amd64.
Chet Ramey
2018-09-05 13:45:20 UTC
Permalink
Post by Enrico Maria De Angelis
This is kind of a pedantic bug report.
Basically it seems that bash's vi-mode doesn't use the same definition of
words/Words/... that Vim uses (which is the facto the always installed
version of vi), but I write you the same, just in case it's an easy task to
do the fix (if you think this is really a bug).
Thanks for the report. The readline vi-mode code needs to be updated to
better handle multibyte characters in a few places; this is one.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Loading...