Unexpected delay in using arguments.

Discussion:

Bize Ma

2018-08-12 07:16:34 UTC

Try this script:

#!/bin/bash
TIMEFORMAT='%R'

n=1000
m=20000

f1 () { :; }

f2 () { i=0; time while [ "$((i+=1))" -lt "$n" ]; do : ; done
i=0; time while [ "$((i+=1))" -lt "$n" ]; do f1 ; done
}

test1() { set -- $(seq $m)
f2 ""
f2 "$@"
}
test1

To get:

0.019
0.028
0.019
19.204

Which is a thousand times slower.
Bash 5 is even worse, try:

time b50sh -c 'f(){ :;};for i do f; done' {0..500}

real 0m20.709s
user 0m19.856s
sys 0m0.024s

Read more detail here:
https://unix.stackexchange.com/questions/462084/bash-has-troubles-using-argument-lists

Greg Wooledge

2018-08-13 13:37:06 UTC

Permalink

Post by Bize Ma
Which is a thousand times slower.
Bash 5 is even worse

Pre-release bash sources use a debugging-friendly (slow) malloc. Or
something. Damn, google is not helping me out here.

Bize Ma

2018-08-13 14:29:09 UTC

Permalink

Yes Greg, Please read the expanded source question, bash 5 with (and
without) bash-malloc has been tested.

https://unix.stackexchange.com/q/462084/265604

The problem with arguments is still present.

Post by Greg Wooledge

Post by Bize Ma
Which is a thousand times slower.
Bash 5 is even worse

Pre-release bash sources use a debugging-friendly (slow) malloc. Or
something. Damn, google is not helping me out here.

Chet Ramey

2018-08-14 15:25:04 UTC

Permalink

    #!/bin/bash
    TIMEFORMAT='%R'
    n=1000
    m=20000
    f1   () { :; }
    f2   () { i=0; time while [ "$((i+=1))" -lt "$n" ]; do     :    ; done
        i=0; time while [ "$((i+=1))" -lt "$n" ]; do     f1    ; done
        }
    test1() { set -- $(seq $m)
        f2 ""
        }
    test1
      0.019
      0.028
      0.019
    19.204
Which is a thousand times slower.

If you build a profiling version of bash, you'll find that about 75% of
that time is spent copying the list of arguments around, since you have
to save and restore it each time you call f1. Looking at making that more
efficient has been a low-level task for a while now.
Bash-5.0-alpha is not a released version, so it uses the debugging malloc.
If you profile that, you'll find that about 99% of the time is spent
marking allocations as active and free in the table the bash malloc uses
to keep track of active memory.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/

Bize Ma

2018-08-15 03:00:22 UTC

Permalink

Post by Bize Ma
(...)
Which is a thousand times slower.
If you build a profiling version of bash, you'll find that about 75% of
that time is spent copying the list of arguments around

I just don't see why there is any need to copy arguments.
Most other shells seem that they don't do that.

Post by Bize Ma
, since you have
to save and restore it each time you call f1.

There is no need to "copy", that just waste additional memory.
Just point to a new list of arguments (it is a change of only a memory
pointer).
Release the memory when the argument list is not being used any more.

Post by Bize Ma
Bash-5.0-alpha is not a released version, so it uses the debugging malloc.
If you profile that, you'll find that about 99% of the time is spent
marking allocations as active and free in the table the bash malloc uses
to keep track of active memory.

Yes, tests with "--without-bash-malloc" and "RELSTATUS=release" were done
after I posted the OP.

Yes, version 5.0 is not corrected, and looks that it is not "worse".
Sorry for the misrepresentation.

Bob Proulx

2018-08-14 21:50:36 UTC

Permalink

Post by Bize Ma
m=20000

...

Post by Bize Ma
test1() { ...
set -- $(seq $m)

At that line I see that bash must allocate enough memory to hold all
of the numbers from 1 through 20000 in memory all at one time. That
is very inefficient. That is at least 100K of memory.

$ seq 20000 | wc -c
108894

Of course I assume this is only a proxy simpler reproducer for the
actual problem program but just the same it is almost always possible
to refactor a program into a different algorithm that avoids the need
to enumerate so many arguments in memory. I suggest refactoring the
program to avoid the need for this memory stress.

Bob

Bize Ma

2018-08-15 03:17:55 UTC

Permalink

Post by Bob Proulx
Of course I assume this is only a proxy simpler reproducer
for the actual problem program

Of course this is a "reproducer" of the issue.

Post by Bob Proulx
but just the same it is almost always possible
to refactor a program into a different algorithm that avoids the need
to enumerate so many arguments in memory.

As you say: "almost".

Take a look at the Stéphane Chazelas example to convince yourself.

Post by Bob Proulx

Post by Bize Ma
m=20000

...

Post by Bize Ma
test1() { ...
set -- $(seq $m)

At that line I see that bash must allocate enough memory to hold all
of the numbers from 1 through 20000 in memory all at one time. That
is very inefficient. That is at least 100K of memory.
$ seq 20000 | wc -c
108894
Of course I assume this is only a proxy simpler reproducer for the
actual problem program but just the same it is almost always possible
to refactor a program into a different algorithm that avoids the need
to enumerate so many arguments in memory. I suggest refactoring the
program to avoid the need for this memory stress.
Bob

Bob Proulx

2018-08-15 07:36:29 UTC

Permalink

Post by Bize Ma

Post by Bob Proulx
but just the same it is almost always possible
to refactor a program into a different algorithm that avoids the need
to enumerate so many arguments in memory.

As you say: "almost".

I still believe that to be true. Since you haven't shared what the
actual task is there is no way for us to propose any counter example
improvements. So "almost" is as far as I can go. Instead should I
say 99.44% of the time? Since I have never come across a counter
example yet?

Post by Bize Ma
Take a look at the Stéphane Chazelas example to convince yourself.

Nothing Stéphane said changed my statement at all.

It does look like bash can be more efficient with argument handling.
Since, for example, dash does it.

Bob

Chet Ramey

2018-08-15 14:04:15 UTC

Permalink

Post by Bob Proulx
It does look like bash can be more efficient with argument handling.
Since, for example, dash does it.

Yes, it just needs new primitives to do it. The existing code for managing
the saved positional parameters has been in bash since the pre-1.0 days and
is pretty much unchanged since then. I'll take a look.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/

Bize Ma

2018-08-15 16:11:01 UTC

Permalink

Post by Bob Proulx
I still believe that to be true.

You are entitled to have an opinion, even if incorrect.

Post by Bob Proulx
Since you haven't shared what the
actual task is there is no way for us to propose any counter example
improvements. So "almost" is as far as I can go. Instead should I
say 99.44% of the time? Since I have never come across a counter
example yet?

Give it time.

Post by Bob Proulx

Post by Bize Ma
Take a look at the Stéphane Chazelas example to convince yourself.

Nothing Stéphane said changed my statement at all.

How do you process a «list of files» without the «list of files» ?

Post by Bob Proulx
It does look like bash can be more efficient with argument handling.
Since, for example, dash does it.

That is true.

Stephane Chazelas

2018-08-14 22:21:58 UTC

Permalink

2018-08-14 11:25:04 -0400, Chet Ramey:
[...]

Post by Chet Ramey
If you build a profiling version of bash, you'll find that about 75% of
that time is spent copying the list of arguments around, since you have
to save and restore it each time you call f1. Looking at making that more
efficient has been a low-level task for a while now.

[...]

To save and restore that list of arguments, you only need to
save and restore one pointer. I don't see why you'd need to copy
the full list of pointers let alone the text of the arguments.

Note that it makes scripts using functions and that receive a
large number of arguments (think of scripts called as find .
-exec myscript {} + for instance) terribly inefficient.

find / -xdev -exec bash -c 'f(){ :;}; for i do f; done' bash {} +

(do nothing in a function for all the files in my root file
system) takes 4 seconds in dash and 9 minutes (135 times as
much) in bash.

--
Stephane