Discussion:
Encoding multiple filenames in a single variable
Jon Seymour
2010-08-29 07:12:16 UTC
Permalink
This isn't strictly a bash question, and I'd prefer a POSIX-only
solution if possible [ suggestions as to a good question to ask
POSIX-only questions would be appreciated ].

Suppose I need to encode a list of filenames in a variable and each
filename may contain spaces, what is good way to encode such a list so
that the resulting variable is readily compose-able and decodeable? In
particular, I'd like to avoid the use of (unescaped) separators which
might themselves be used in the filename.

jon.
Jon Seymour
2010-08-29 07:13:33 UTC
Permalink
| question -> _forum_
Post by Jon Seymour
This isn't strictly a bash question, and I'd prefer a POSIX-only
solution if possible [ suggestions as to a good _forum_ to ask
POSIX-only questions would be appreciated ].
Suppose I need to encode a list of filenames in a variable and each
filename may contain spaces, what is good way to encode such a list so
that the resulting variable is readily compose-able and decodeable? In
particular, I'd like to avoid the use of (unescaped) separators which
might themselves be used in the filename.
jon.
Chris F.A. Johnson
2010-08-29 08:07:23 UTC
Permalink
Post by Jon Seymour
This isn't strictly a bash question, and I'd prefer a POSIX-only
solution if possible [ suggestions as to a good question to ask
POSIX-only questions would be appreciated ].
The comp.unix.shell newsgroup is a good place.
Post by Jon Seymour
Suppose I need to encode a list of filenames in a variable and each
filename may contain spaces, what is good way to encode such a list so
that the resulting variable is readily compose-able and decodeable? In
particular, I'd like to avoid the use of (unescaped) separators which
might themselves be used in the filename.
Either separate them with newlines, or (non-POSIX) use an array.

## POSIX
NL='
'
files=${files:+$files$NL}$nextfile

## Array
files+=( "$nextfile" )
--
Chris F.A. Johnson, <http://cfajohnson.com>
Author:
Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
Greg Wooledge
2010-08-30 13:07:53 UTC
Permalink
Post by Chris F.A. Johnson
Post by Jon Seymour
This isn't strictly a bash question, and I'd prefer a POSIX-only
solution if possible
Suppose I need to encode a list of filenames in a variable
POSIX shells won't have arrays (they're allowed to, but they're also
allowed NOT to, which means you can't count on them being present), but
you can enlist the positional parameters for use as an array under
certain conditions.

set ./*myglob*
process "$@"

Of course you don't get bash's full set of capabilities. You can't
use range notation to get a "slice" (as some people call it) of an array
(for the rest of us, that means "from element M to element N"), and
you can't set or unset individual elements. Nor can you populate it
using a loop reading from "find -print0" or similar as you can with a
bash array.

But if you just need to populate them from a glob, this should suffice.
Post by Chris F.A. Johnson
Either separate them with newlines, or (non-POSIX) use an array.
Filenames can also contain newlines, unfortunately.

A third choice would be to store the list of filenames in a file, rather
than in a variable. The advantage of this is that you can store NUL
bytes in a file (unlike in a variable). The disadvantage is a need for
platform-specific utilities to create safe temporary files, or for some
sort of application-level strategy to earmark a safe place to create them
using primitive-but-portable means. And of course you'd need some way
to read them from the file.

It all boils down to exactly what you need to do with the filenames once
you have them.
Jon Seymour
2010-08-30 13:25:00 UTC
Permalink
Chris, Andrej and Greg,

Thanks for your helpful replies.

You are quite correct on pointing out that the solution does depend on
how it is to be used

To provide more context:

I am working on an extension to git, and need to store a list of shell
files that can be used to extend the capabilities of the command I am
writing. Most of the time, a variable of the form:

GIT_EXTRA_CONDITION_LIBS="libA.sh libB.sh" would work, but technically
speaking, I do need to support spaces in the path (if nothing else,
git's test suite cunningly runs within a directory that contains space
in the name :-).

So, I would like the convenience of using spaces to delimit entries in
the variable since I don't want people to have to define NL variables
in order to extend the variable. On the otherhand, if they do want to
use an inconvenient filename with spaces, there has to be a way to do
it.

In the end, what I have done is make use of git's rev-parse --sq-quote
feature to quote filenames that can contain spaces. That way, if you
really want spaces in the filenames, you can have it, but if you
don't, then you get the convenience of space as a separator.

So, for example:

GIT_EXTRA_CONDITION_LIBS="libA.sh 'lib B.sh' libC.sh"

I am lucky in that I can assume the existence of git rev-parse on my
path and I am prepared to write the decoding glue in my script.

Anyway, thank you all for your input.

jon.
Post by Greg Wooledge
Post by Jon Seymour
This isn't strictly a bash question, and I'd prefer a POSIX-only
solution if possible
Suppose I need to encode a list of filenames in a variable
POSIX shells won't have arrays (they're allowed to, but they're also
allowed NOT to, which means you can't count on them being present), but
you can enlist the positional parameters for use as an array under
certain conditions.
 set ./*myglob*
Of course you don't get bash's full set of capabilities.  You can't
use range notation to get a "slice" (as some people call it) of an array
(for the rest of us, that means "from element M to element N"), and
you can't set or unset individual elements.  Nor can you populate it
using a loop reading from "find -print0" or similar as you can with a
bash array.
But if you just need to populate them from a glob, this should suffice.
   Either separate them with newlines, or (non-POSIX) use an array.
Filenames can also contain newlines, unfortunately.
A third choice would be to store the list of filenames in a file, rather
than in a variable.  The advantage of this is that you can store NUL
bytes in a file (unlike in a variable).  The disadvantage is a need for
platform-specific utilities to create safe temporary files, or for some
sort of application-level strategy to earmark a safe place to create them
using primitive-but-portable means.  And of course you'd need some way
to read them from the file.
It all boils down to exactly what you need to do with the filenames once
you have them.
Greg Wooledge
2010-08-30 13:33:55 UTC
Permalink
Post by Jon Seymour
I am working on an extension to git, and need to store a list of shell
files that can be used to extend the capabilities of the command I am
GIT_EXTRA_CONDITION_LIBS="libA.sh libB.sh" would work, but technically
speaking, I do need to support spaces in the path (if nothing else,
git's test suite cunningly runs within a directory that contains space
in the name :-).
If this is an environment variable, then you're screwed. Environment
variables can't be arrays, and if they could, they surely wouldn't be
portable.
Post by Jon Seymour
In the end, what I have done is make use of git's rev-parse --sq-quote
feature to quote filenames that can contain spaces. That way, if you
really want spaces in the filenames, you can have it, but if you
don't, then you get the convenience of space as a separator.
GIT_EXTRA_CONDITION_LIBS="libA.sh 'lib B.sh' libC.sh"
What does it do with filenames that contain apostrophes? How do you
read the filenames back out of this format? The only ways I can
think of off the top of my head to parse that kind of input in a
script are eval and xargs, and those should send you screaming....

There really is no good way to put multiple filenames into a single
environment variable. Your best bet is to put them in a file and
make the environment variable point to that file. The contents of
the file would have to be NUL-delimited or newline-delimited. I'm
pretty sure that you'll end up going with newline delimiters and just
saying "if your filenames have newlines in them, you lose, so don't
do that". Which is not the worst approach in the world.
Jon Seymour
2010-08-30 22:14:58 UTC
Permalink
Post by Jon Seymour
I am working on an extension to git, and need to store a list of shell
files that can be used to extend the capabilities of the command I am
GIT_EXTRA_CONDITION_LIBS="libA.sh libB.sh" would work, but technically
speaking, I do need to support spaces in the path (if nothing else,
git's test suite cunningly runs within a directory that contains space
in the name :-).
If this is an environment variable, then you're screwed.  Environment
variables can't be arrays, and if they could, they surely wouldn't be
portable.
Post by Jon Seymour
In the end, what I have done is make use of git's rev-parse --sq-quote
feature to quote filenames that can contain spaces. That way, if you
really want spaces in the filenames, you can have it, but if you
don't, then you get the convenience of space as a separator.
    GIT_EXTRA_CONDITION_LIBS="libA.sh 'lib B.sh' libC.sh"
What does it do with filenames that contain apostrophes?  How do you
read the filenames back out of this format?  The only ways I can
think of off the top of my head to parse that kind of input in a
script are eval and xargs, and those should send you screaming....
There really is no good way to put multiple filenames into a single
environment variable.  Your best bet is to put them in a file and
make the environment variable point to that file.  The contents of
the file would have to be NUL-delimited or newline-delimited.  I'm
pretty sure that you'll end up going with newline delimiters and just
saying "if your filenames have newlines in them, you lose, so don't
do that".  Which is not the worst approach in the world.
All good points. I think I'll change tack slighty. git has a
configuration mechanism that can server to store lists that users
might want to configure and I can add an --include option where
required to allow wrapper commands to add their own libraries, as
required, thus eliminating the requirement to inherit such lists from
the environment.

Thanks again.

jon.

Andre Majorel
2010-08-29 09:52:40 UTC
Permalink
Post by Jon Seymour
This isn't strictly a bash question, and I'd prefer a POSIX-only
solution if possible [ suggestions as to a good question to ask
POSIX-only questions would be appreciated ].
comp.unix.shell
Post by Jon Seymour
Suppose I need to encode a list of filenames in a variable and each
filename may contain spaces, what is good way to encode such a list so
that the resulting variable is readily compose-able and decodeable? In
particular, I'd like to avoid the use of (unescaped) separators which
might themselves be used in the filename.
Depends. Where do those file names come from and how are they
used ? Command line arguments ? File descriptor ? Separated by
NULs or newlines ? One by one or all at once ?
--
André Majorel http://www.teaser.fr/~amajorel/
Loading...