Discussion:
field splitting with IFS non-whitespace
Greg Wooledge
2011-01-11 20:36:11 UTC
Permalink
POSIX 2.6.5 Field Splitting [1] says, in part,

1. If IFS is <space><tab><newline> or unset, ...
2. If IFS is null, ...
3. Otherwise, ...
b. Each occurrence in the input of an IFS character that is not IFS
white space, along with any adjacent IFS white space, shall delimit
a field, as described previously.

I'm attempting to understand what exactly "delimit a field" means.
Specifically, consider this case:

$ (x=bar, IFS=,; set -f; a=($x); printf "<%s> " "${a[@]}"; echo)
<bar>

With an input string ending with a non-whitespace IFS character, bash
apparently drops the final character altogether, rather than creating
an empty second field. Bash 2.05 through 4.2-beta all do this, and
ksh88 and ksh93 as well.

Is that the correct behavior? Does "delimit a field" mean "end a field,
and possibly start a new one if there's something after it", or does it
always mean "start a new field"? (It seems bash and ksh use the former
definition.)

I expected to see two fields resulting, largely because:

$ (x=,bar IFS=,; set -f; a=($x); printf "<%s> " "${a[@]}"; echo)
<> <bar>

An IFS delimiter at the start of the string is not "ignored" the way
an IFS delimiter at the end appears to be.

The question gets slightly more interesting when we look at read:

$ (IFS=, read -r a <<< "bar,"; echo "<$a>")
<bar>

Normally I would expect read with a single argument variable to put
the entire input line, minus leading/trailing IFS *whitespace*, into
that variable.

But apparently that's not what it does in bash or ksh, much to my
surprise. A *single* trailing IFS non-whitespace delimiter gets eaten.
But multiple trailing IFS non-whitespace delimiters do not:

$ (IFS=, read -r a <<< "bar,,"; echo "<$a>")
<bar,,>

I can understand the behavior here, actually, due to the "If there are
fewer vars than fields" clause of POSIX's definition of read. [2] It's
just the single-delimiter case that's got me mixed up.


[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

[2] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/read.html
Chet Ramey
2011-01-15 02:15:19 UTC
Permalink
Post by Greg Wooledge
POSIX 2.6.5 Field Splitting [1] says, in part,
1. If IFS is <space><tab><newline> or unset, ...
2. If IFS is null, ...
3. Otherwise, ...
b. Each occurrence in the input of an IFS character that is not IFS
white space, along with any adjacent IFS white space, shall delimit
a field, as described previously.
I'm attempting to understand what exactly "delimit a field" means.
The standard is consistent -- or tries to be -- in saying that delimit
a field means terminate a field. For instance, the current version of
the standard says, in 2.6.5:

"The shell shall treat each character of the IFS as a delimiter and use
the delimiters as field terminators to split the results of parameter
expansion and command substitution into fields"

A discussion nearly identical to this one took place in February/March
2005, and should be available in the austin-group mailing list archives.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ***@case.edu http://cnswww.cns.cwru.edu/~chet/
Loading...