Discussion:
bash sockets: printf \x0a does TCP fragmentation
dirk+
2018-09-21 20:13:56 UTC
Permalink
Hello there,

we discovered a strange phenomenon in the project testssl.sh:

After opening a TCP socket with a fd (here: 5), when writing to it,
it seems that

printf -- "$data" >&5 2>/dev/null

does not do what it is intended. "$data" is a ClientHello like

'\x16\x03\x01\x2\x00\x01\x00\x1\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0a\...'

Each \x0a like the last one causes a new TCP fragment to begin which can be easily
spotted when using wireshark while running e.g.

testssl.sh --assume-http -p testssl.sh

Starting from the SSLv3 ClientHello the first reassembled packet
ends with 0a.

See also discussion @ https://github.com/drwetter/testssl.sh/pull/1113.

One would assume that a bash socket connection cannot influence the TCP
fragmentation but obviously it does.

This behavior has a performance penalty and other strange effects, e.g.
if the first segment is really small, some devices reject the ClientHello.


If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.

Cheers, Dirk


PS: Would ulimit -b <parameter> help?
Chet Ramey
2018-09-21 23:34:52 UTC
Permalink
Post by dirk+
Hello there,
After opening a TCP socket with a fd (here: 5), when writing to it,
it seems that
printf -- "$data" >&5 2>/dev/null
does not do what it is intended. "$data" is a ClientHello like
'\x16\x03\x01\x2\x00\x01\x00\x1\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0a\...'
Each \x0a like the last one causes a new TCP fragment to begin which can be easily
spotted when using wireshark while running e.g.
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Post by dirk+
If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.
How? Does the emitted output not correspond to what's passed to printf
in some way?
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
dirk+
2018-09-22 08:21:53 UTC
Permalink
Post by Chet Ramey
Post by dirk+
Hello there,
After opening a TCP socket with a fd (here: 5), when writing to it,
it seems that
printf -- "$data" >&5 2>/dev/null
does not do what it is intended. "$data" is a ClientHello like
'\x16\x03\x01\x2\x00\x01\x00\x1\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0a\...'
Each \x0a like the last one causes a new TCP fragment to begin which can be easily
spotted when using wireshark while running e.g.
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Anything one can do on the level of bash or non-syscall land? What about
ulimit -b ?
Post by Chet Ramey
Post by dirk+
If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.
How? Does the emitted output not correspond to what's passed to printf
in some way?
"\x0a" is a legitimate byte which is send from time to time over the
socket. It happens if the record layer is e.g. 522 bytes (\x02\x0a), if
a standard cipher is included in the handshake like (\xc0\x0a) or DES-CBC3-SHA
(\x00\x0a) ECDHE-ECDSA-AES256-SHA or at any other occasion.

Everything works as expected and like a charm for years now -- the only thing isthe
underlying TCP fragmentation which is caused as a side effect by sending
\x0a.

As indicated a few servers under certain condition can't cope with it asif the TCP
first segment is too small they don't treat this as ClientHello
and just drop the packet, see thread @
https://github.com/drwetter/testssl.sh/pull/1113, specifically the hint wrt
https://support.f5.com/csp/article/K53322151 .

My stance is simply if I use in the internal feature of bash for TCP socket
programming I didn't expect to have side effects like this.


Thx, Dirk
Chet Ramey
2018-09-22 20:38:23 UTC
Permalink
Post by dirk+
Post by Chet Ramey
Post by dirk+
Hello there,
After opening a TCP socket with a fd (here: 5), when writing to it,
it seems that
printf -- "$data" >&5 2>/dev/null
does not do what it is intended. "$data" is a ClientHello like
'\x16\x03\x01\x2\x00\x01\x00\x1\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0a\...'
Each \x0a like the last one causes a new TCP fragment to begin which can be easily
spotted when using wireshark while running e.g.
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Anything one can do on the level of bash or non-syscall land? What about
ulimit -b ?
Bash sets stdout and stderr to be line buffered. It's done this forever,
long before I added /dev/tcp and socket support. I'd expect this to have
shown up long ago.

I doubt stdio attempts to divine the buffer size of the underlying file
descriptor. Changing the socket buffer size would probably not be reflected
into stdio's buffering behavior.
Post by dirk+
Post by Chet Ramey
Post by dirk+
If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.
How? Does the emitted output not correspond to what's passed to printf
in some way?
"\x0a" is a legitimate byte which is send from time to time over the
socket. It happens if the record layer is e.g. 522 bytes (\x02\x0a), if
a standard cipher is included in the handshake like (\xc0\x0a) or DES-CBC3-SHA
(\x00\x0a) ECDHE-ECDSA-AES256-SHA or at any other occasion.
That's the thing: 0x0a is a legitimate input byte and is faithfully
reproduced in the output. It's a bug if the output doesn't reflect the
input. We're talking about side effects you don't want, which happen at
a different layer.
Post by dirk+
Everything works as expected and like a charm for years now -- the only thing isthe
underlying TCP fragmentation which is caused as a side effect by sending
\x0a.
So the newline has been an issue all along?
Post by dirk+
As indicated a few servers under certain condition can't cope with it asif the TCP
first segment is too small they don't treat this as ClientHello
https://github.com/drwetter/testssl.sh/pull/1113, specifically the hint wrt
https://support.f5.com/csp/article/K53322151 .
My stance is simply if I use in the internal feature of bash for TCP socket
programming I didn't expect to have side effects like this.
These side effects are well below bash's layer of abstraction. You can
unbuffer the output by using some kind of unbuffering program (as others
have noted, `dd' is probably the most portable) and avoid them.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Ilkka Virta
2018-09-22 10:38:06 UTC
Permalink
Post by Chet Ramey
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Mostly out of curiosity, what kind of buffering logic does Bash (or the
builtin printf in particular) use? It doesn't seem to be the usual stdio
logic where you get line-buffering if printing to a terminal and block
buffering otherwise. I get a distinct write per line even if the stdout
of Bash itself is redirected to say /dev/null or a pipe:

$ strace -etrace=write bash -c 'printf "foo\nbar\n"' > /dev/null
write(1, "foo\n", 4) = 4
write(1, "bar\n", 4) = 4
+++ exited with 0 +++
--
Ilkka Virta / ***@iki.fi
dirk+
2018-09-22 10:49:57 UTC
Permalink
Post by Chet Ramey
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Mostly out of curiosity, what kind of buffering logic does Bash (or the builtin
printf in particular) use? It doesn't seem to be the usual stdio logic where you get
line-buffering if printing to a terminal and block buffering otherwise. I get a
distinct write per line even if the stdout of Bash itself is redirected to say
 $ strace -etrace=write bash -c 'printf "foo\nbar\n"' > /dev/null
 write(1, "foo\n", 4)                    = 4
 write(1, "bar\n", 4)                    = 4
 +++ exited with 0 +++
Oh. But thanks anyway!

coreutils in fact does it in one shot as you indicated.


Dirk
Chet Ramey
2018-09-23 18:23:08 UTC
Permalink
Post by dirk+
Post by Chet Ramey
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Mostly out of curiosity, what kind of buffering logic does Bash (or the builtin
printf in particular) use? It doesn't seem to be the usual stdio logic where you get
line-buffering if printing to a terminal and block buffering otherwise. I get a
distinct write per line even if the stdout of Bash itself is redirected to say
 $ strace -etrace=write bash -c 'printf "foo\nbar\n"' > /dev/null
 write(1, "foo\n", 4)                    = 4
 write(1, "bar\n", 4)                    = 4
 +++ exited with 0 +++
Oh. But thanks anyway!
coreutils in fact does it in one shot as you indicated.
Then the change you need suggests itself:

env printf ...

or

(exec printf ...)

since the bash exec builtin doesn't execute builtin commands.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Chet Ramey
2018-09-22 20:39:22 UTC
Permalink
Post by Ilkka Virta
Post by Chet Ramey
Newline? It's probably that stdout is line-buffered and the newline causes
a flush, which results in a write(2).
Mostly out of curiosity, what kind of buffering logic does Bash (or the
builtin printf in particular) use?
Bash sets stdout and stderr to line buffering. It's done this since the
early 1.x days.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Bob Proulx
2018-09-22 05:30:25 UTC
Permalink
You are doing something that is quite unusual. You are using a shell
script direction on a TCP socket. That isn't very common. More
typically one would use a C program instead. So it isn't surprising
that you are finding interactions that are not well known.
Post by dirk+
printf -- "$data" >&5 2>/dev/null
Why is stderr discarded? That is almost always bad because it
discards any errors that might occur. You probably shouldn't do this.

What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
Post by dirk+
does not do what it is intended.
"Intent" is in the eye of the beholder.
Post by dirk+
"$data" is a ClientHello like
'\x16\x03\x01\x2\x00\x01\x00\x1\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0a\...'
Each \x0a like the last one causes a new TCP fragment to begin which can be easily
spotted when using wireshark while running e.g.
As Chet said the libc stdio library is probably doing line oriented
buffering. The newline is causing a flush at that time.
Post by dirk+
One would assume that a bash socket connection cannot influence the TCP
fragmentation but obviously it does.
One would be in error to assume this.
Post by dirk+
If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.
You can re-block the output stream using other tools such as 'cat' or
'dd'. Since you are concerned about block size then perhaps dd is the
better of the two.

| cat

Or probably better:

| dd status=none bs=1M

Or use whatever block size you wish. The 'dd' program will read the
input into its buffer and then output that block of data all in one
write(2). That seems to be what you are wanting.

Good luck! :-)

Bob

P.S. You can possibly use the 'stdbuf' command to control the output
buffering depending upon the program.

info stdbuf
dirk+
2018-09-22 09:50:17 UTC
Permalink
Post by Bob Proulx
You are doing something that is quite unusual. You are using a shell
script direction on a TCP socket. That isn't very common.
Do you think there should be a paragraph NOT COMMON where bash sockets
should rather belong to?
Post by Bob Proulx
More
typically one would use a C program instead. So it isn't surprising
that you are finding interactions that are not well known.
Bob, my intention was not to discuss program languages and what is typical
with you or anybody else here.
Post by Bob Proulx
Post by dirk+
printf -- "$data" >&5 2>/dev/null
Why is stderr discarded? That is almost always bad because it
discards any errors that might occur. You probably shouldn't do this.>
What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
I doubt you can judge on this by just looking at a single line
of code -- the project has > 18k LoC in bash.

Github is the place to discuss and do PRs for our project.
Post by Bob Proulx
Post by dirk+
If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.
You can re-block the output stream using other tools such as 'cat' or
'dd'. Since you are concerned about block size then perhaps dd is the
better of the two.
| cat
cat has a problem with binary chars, right? And: see below.
Post by Bob Proulx
| dd status=none bs=1M
Or use whatever block size you wish. The 'dd' program will read the
input into its buffer and then output that block of data all in one
write(2). That seems to be what you are wanting.
We actually use dd to read from the socket. Of course we could use
writing to it as well -- at a certain point of time.

Still, a prerequisite would be that printf is the culprit and not
how bash + libs do sockets.
Post by Bob Proulx
P.S. You can possibly use the 'stdbuf' command to control the output
buffering depending upon the program.
info stdbuf
That could be an option, thanks. Need to check though whether

a) it doesn't fragment then -- not sure while reading it
b) it's per default available on every platform supported by testssl.sh.


Cheers, Dirk
Ilkka Virta
2018-09-22 10:30:02 UTC
Permalink
Post by dirk+
cat has a problem with binary chars, right? And: see below.
No, it just loops with read() and write(), it shouldn't touch any of the
bytes (except for cat -A and such). But it probably doesn't help in
coalescing the write blocks, it's likely to just write() whatever it
gets immediately.

And you can't really solve the issue at hand by piping to any
intermediate program, as that program couldn't know how long to buffer
the input. Unless you use something that buffers for a particular amount
of time, which of course causes a unnecessary delay.

The coreutils printf seems to output 'foo\nbar\n' as a single write,
though (unless it goes to the terminal, so the usual stdio buffering),
so you might be able to use that.


In any case, if a TCP endpoint cares about getting full data packets
within a single segment, I'd say it's broken.
--
Ilkka Virta / ***@iki.fi
dirk+
2018-09-22 10:57:20 UTC
Permalink
The coreutils printf seems to output 'foo\nbar\n' as a single write, though (unless
it goes to the terminal, so the usual stdio buffering), so you might be able to use
that.
thx. Might be not that portable but we'll see.
In any case, if a TCP endpoint cares about getting full data packets within a single
segment, I'd say it's broken.
fully agree. But unfortunately it just comforts us :-)

Keep in mind that the purpose of the tool is testing and if due to a bug it can't do
that, were the ones being blamed or we need to do really strange workarounds to avoid
'\x0a' in the first 8 bytes.


Dirk
Bob Proulx
2018-09-22 20:22:19 UTC
Permalink
I see that you have subscribed now. Awesome! If you and others would
be so kind as to list-reply instead of CC'ing me directly that would
be great. I read the replies on the mailing list.
Post by dirk+
Post by Bob Proulx
You are doing something that is quite unusual. You are using a shell
script direction on a TCP socket. That isn't very common.
Do you think there should be a paragraph NOT COMMON where bash sockets
should rather belong to?
You actually had not included enough background information to know if
you were using the bash built in network implementation or not. You
only showed that you had set up fd 5 connected to a network socket.
That can happen because, for example, a script was used to service an
inetd configuration or similar. It doesn't actually need to be the
built in network protocol at all. But now that you have said the
above I guess I can assume that you are using the built in
implementation.

As to whether the documentation should say this or not that is not
really practical. There are a godzillian different things that are
not typically addressed by writing a shell script. As a practical
matter it is impossible to list everything out explicitly. And if one
tries then the complaint is that the documentation is so long and
detailed that is is unusable due to it.

Primarily a shell script is a command and control program. It is very
good for that purpose. It is typically used for that purpose. That
is the mainstream use and it is very unlikely one will run into
unusual situations there.

But programming tasks that are much different from command and control
tasks, such as your program interacting by TCP with other devices on
the network, are not as common. I don't have facts to back that up
but I do believe that to be true based upon the way I have seen shell
scripts being programmed and used over a long period of time. Of
course if you have spent the last 20 years programming network shell
scripts then your observations will bias you the other way. :-)
Post by dirk+
Post by Bob Proulx
More
typically one would use a C program instead. So it isn't surprising
that you are finding interactions that are not well known.
Bob, my intention was not to discuss program languages and what is typical
with you or anybody else here.
Hmm... Put yourself in our shoes. You stood up on the podium that is
this public mailing list and spoke into the megaphone addressing all
of us complaining that bash's printf was buggy. But from my
perspective printf is behaving as expected. It is designed to deal
with line oriented data. It will also deal with binary data if one is
careful. But it appears that your application wasn't careful enough
and had tripped over some problems.

Should we (me!) keep silent about those very obvious problems? It
feels obvious to me but apparently not to the author of the above. As
has often been said many eyes make all bugs apparent. I was pointing
this out to you as a public service. But in response you seem hostile
by the language above and below. That isn't encouraging any help. :-(
Post by dirk+
Post by Bob Proulx
Post by dirk+
printf -- "$data" >&5 2>/dev/null
Why is stderr discarded? That is almost always bad because it
discards any errors that might occur. You probably shouldn't do this.>
What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
I doubt you can judge on this by just looking at a single line
of code -- the project has > 18k LoC in bash.
That single line of code was problematic just by itself standing alone
without the rest of the program around it. That is independent of
anything the rest of the program might contain.

However if you would like to pass sections of the rest of the program
through the help-bash mailing list then I am sure the group there
would help improve the quality of it.
Post by dirk+
Github is the place to discuss and do PRs for our project.
No. Sorry. You came here to this mailing list. Therefore this is
the place to discuss it. Please put yourself in my shoes. If the
case were reversed and I came over to Github and then stated that
Github was not the place for the discussion but that you needed to set
up email and come over to my mailing list and discuss it there
instead. How would you feel? I had come into your house, asked you
for help, then wanted you to go elsewhere? How would you feel? I can
tell you that I do not feel very welcome by it.

Also remember that Github is a non-free service. That is free as in
freedom, not free as in beer. The free in Free Software. Or in this
case the opposite of it being non-free. We try not to use software
that does not respect our freedoms nor ask others to do so either.
It's a philosophy of life thing. I hope you will understand.
Post by dirk+
Post by Bob Proulx
Post by dirk+
If there's a workaround, please let me know. (tried to add "%b" with no
effect). Otherwise I believe it's a bug.
Note that I *did* provide you with a way to do what you wanted to do. :-)

It was also noted in another message that the external standalone
printf command line utility did buffer as you desired. That seems
another very good solution too. Simply use "command printf ..." to
force using the external version.

Anyway... Since printf is a text oriented utility it makes sense to
me that I would operate in line buffered output mode.

Let's look at the bash documentation for 'help printf':

printf: printf [-v var] format [arguments]
Formats and prints ARGUMENTS under control of the FORMAT.
...
FORMAT is a character string which contains three types of
objects: plain characters, which are simply copied to standard
output; character escape sequences, which are converted and copied
to the standard output; and format specifications, each of which
causes printing of the next successive argument.

The format provided in your example in $data is interpreted as a
"character string". Apparently newlines (\n a.k.a. 0x0a characters)
are used in the binary data in your implementation! However as a
newline character it is causing line buffered output to be flushed
resulting in line oriented write(2) calls.

If you are trying to print raw binary data then I don't think you
should be using 'printf' to do it. It just feels like the wrong
utility to be used to me. Also there was the problematic use of it in
the format string.

Instead I would use utilities designed to work with binary data. Such
as 'cat'. I personally might prepare a temporary file containing
exactly the raw data that is needed to be transmitted and then use
"cat $tmpfile >&5" to transmit it. Or if I wanted strict control of
the block size making cat less appropriate then I would use "dd
if=$tmpfile status=none bs=1M >&5" or some such where no
interpretation of the data is done.

However there may be a bug in the way bash opens that fd number 5 and
sets up buffering. If it were me then I would look closely there. It
is possible however that file descriptor was being opened that it
should be using block buffering instead of line buffering. Since the
network socket is not a tty I would suspect that it should be using
block buffering. That is what I would expect. Therefore that is
where I would look for a bug. Obviously I can be wrong though too.

One should double check that fd 5 is not a tty.

if [ -t 5 ]; then

If it is a tty when I expect line buffering. If it is not then I
would expect block buffering. Just as a general statement about
programs using libc's stdio to write to it.
Post by dirk+
Post by Bob Proulx
You can re-block the output stream using other tools such as 'cat' or
'dd'. Since you are concerned about block size then perhaps dd is the
better of the two.
| cat
cat has a problem with binary chars, right? And: see below.
No. It does not. The 'cat' utility concatenates files. From the cat
documentation:

‘cat’ copies each FILE (‘-’ means standard input), or standard input if
none are given, to standard output. Synopsis:
...
On systems like MS-DOS that distinguish between text and binary
files, ‘cat’ normally reads and writes in binary mode. However, ‘cat’
reads in text mode if one of the options ‘-bensAE’ is used or if ‘cat’
is reading from standard input and standard input is a terminal.
Similarly, ‘cat’ writes in text mode if one of the options ‘-bensAE’ is
used or if standard output is a terminal.
Post by dirk+
Post by Bob Proulx
| dd status=none bs=1M
Or use whatever block size you wish. The 'dd' program will read the
input into its buffer and then output that block of data all in one
write(2). That seems to be what you are wanting.
We actually use dd to read from the socket. Of course we could use
writing to it as well -- at a certain point of time.
Great! Problem solved then. :-)

I didn't say it before but since this is such a long email making it a
little longer won't hurt more. The status=none dd option is a GNU
extension. It is useful in this context. But it is not a portable dd
option. Other platforms may or may not implement it. *BSD implements
it now but some of my beloved legacy Unix platforms do not.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html
Post by dirk+
Still, a prerequisite would be that printf is the culprit and not
how bash + libs do sockets.
The repeated mention of sockets nudges me to point out that sockets
are just files. There is nothing special about them as such. Trying
to find fault there is just a false path to follow. Programs writing
to a file descripted connected to a network socket don't "know"
anything about the network. It is the network layer that is taking
each write(2) and sending out packets.

What is special is whether the device is a tty or not. If it is a tty
then libc's standard I/O buffering does one thing. If it is not a tty
then libc's standard I/O buffering does a different thing. Let's look
at the documentation.

For me when I want to look up documentation matching my system I use
the locally installed info pages. But for the purposes of showing
where this documentation exists I will point to the top of tree
version online. However note that it may be newer than what you have
installed locally.

https://www.gnu.org/software/libc/manual/html_node/Stream-Buffering.html#Stream-Buffering
https://www.gnu.org/software/libc/manual/html_node/Buffering-Concepts.html#Buffering-Concepts

Newly opened streams are normally fully buffered, with one
exception: a stream connected to an interactive device such as a
terminal is initially line buffered.
...
The use of line buffering for interactive devices implies that
output messages ending in a newline will appear immediately-which
is usually what you want.

Additionally the stdio man page says:

man stdio

Output streams that refer to terminal devices are always line buffered
by default; pending output to such streams is written automatically
whenever an input stream that refers to a terminal device is read. In
cases where a large amount of computation is done after printing part
of a line on an output terminal, it is necessary to fflush(3) the
standard output before going off and computing so that the output will
appear.

However I did not look at how bash's implementation of printf was
coded. The above is just general information that generally applies
to all utilities.
Post by dirk+
Post by Bob Proulx
P.S. You can possibly use the 'stdbuf' command to control the output
buffering depending upon the program.
info stdbuf
That could be an option, thanks. Need to check though whether
a) it doesn't fragment then -- not sure while reading it
I feel compelled to say that the network stack is going to transmit a
packet every time write(2) is called. Programs doing the writing
don't know that they are writing to a network stream. They are just
writing data using write(2). If it is a fully network aware program
then of course it may be using sendto(2) or other network specific
call. But general filter utilities are not going to be using those
calls and are just going to read(2) and write(2) and not have any
specific network coding. That's part of the beauty of the Unix
Philosophy. Everything is a file. In your case though you are trying
to pump around binary data and are using line oriented text utilities
that are using line buffering and that is where problems are being
tripped over.

You are thinking of this as fragmentation. Because in your
application it appears to you in your context as fragmentation. But
as a general statement it isn't fragmentation. It is just a data
stream being written every time it is being written. Certainly any
text program writing lines out isn't going to be coded in any way that
knows about TCP data blocks. For any program in the middle it is just
lines of text in and lines of text out. Or in the case of other
programs that deal with binary data such as 'cat' it is just bytes in
and bytes out. The concept of fragmentation belongs to a different
layer of the software block diagram.

[[
There is an old joke related to this too. "The Unix way -- everything
is a file. The Linux way -- everything is a filesystem." Haha!

And also a quote, "I think the major good idea in Unix was its clean
and simple interface: open, close, read, and write." --Ken Thompson
]]
Post by dirk+
b) it's per default available on every platform supported by testssl.sh.
The 'stdbuf' utility is included in GNU coreutils starting with
version 7.5 onward. It may not be available on other platforms. It
didn't feel like the right solution to me. But I mentioned it in
passing in the P.S. because it is related. Perhaps it will be useful
to you.

Hope this helps! :-)

Bob
Chet Ramey
2018-09-23 18:26:32 UTC
Permalink
Post by Bob Proulx
Note that I *did* provide you with a way to do what you wanted to do. :-)
It was also noted in another message that the external standalone
printf command line utility did buffer as you desired. That seems
another very good solution too. Simply use "command printf ..." to
force using the external version.
This won't work the way you want. The `command' builtin only inhibits
execution of shell functions. It still executes builtins. You want to
either get the full pathname of a printf utility using `type -ap printf'
and use that, or use the env or exec variants I recommended in my last
message.
Post by Bob Proulx
Anyway... Since printf is a text oriented utility it makes sense to
me that I would operate in line buffered output mode.
It's that bash sets stdout and stderr to be line-buffered, not anything
printf-specific.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Bob Proulx
2018-09-23 18:33:07 UTC
Permalink
Post by Chet Ramey
Post by Bob Proulx
It was also noted in another message that the external standalone
printf command line utility did buffer as you desired. That seems
another very good solution too. Simply use "command printf ..." to
force using the external version.
This won't work the way you want. The `command' builtin only inhibits
execution of shell functions. It still executes builtins. You want to
either get the full pathname of a printf utility using `type -ap printf'
and use that, or use the env or exec variants I recommended in my last
message.
Oh drat! Now I have had to learn *TWO* things today. :-)
Post by Chet Ramey
Post by Bob Proulx
Anyway... Since printf is a text oriented utility it makes sense to
me that I would operate in line buffered output mode.
It's that bash sets stdout and stderr to be line-buffered, not anything
printf-specific.
I still think 'printf' feels more like a plain text utility and not
something one reaches for when working with binary data blobs.

Bob
Bob Proulx
2018-09-23 18:46:59 UTC
Permalink
Post by Chet Ramey
It's that bash sets stdout and stderr to be line-buffered, not anything
printf-specific.
Shouldn't bash set stdout buffering based upon the output descriptor
being a tty or not the same as other libc stdio behavior?

Bob
Chet Ramey
2018-09-23 19:15:46 UTC
Permalink
Post by Bob Proulx
Post by Chet Ramey
It's that bash sets stdout and stderr to be line-buffered, not anything
printf-specific.
Shouldn't bash set stdout buffering based upon the output descriptor
being a tty or not the same as other libc stdio behavior?
It's been so long (25+ years) I forget why we did it.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
dirk+
2018-09-25 13:04:37 UTC
Permalink
Post by Chet Ramey
Post by Bob Proulx
Note that I *did* provide you with a way to do what you wanted to do. :-)
It was also noted in another message that the external standalone
printf command line utility did buffer as you desired. That seems
another very good solution too. Simply use "command printf ..." to
force using the external version.
This won't work the way you want. The `command' builtin only inhibits
execution of shell functions. It still executes builtins. You want to
either get the full pathname of a printf utility using `type -ap printf'
and use that, or use the env or exec variants I recommended in my last
message.
FYI: I ended up checking with type before whether an external printf
exists and set a variable for this and then just call this variable.

env or exec: never thought about it (thanks!) but as both are external
commands, that would mean upon every call one additional external program.
(yes, I know that there is such thing as a fs buffer). Subshells also costs
resources. As this is a core function I am happy for every homeopathic dose
of time I safe here :-)

Cheers, Dirk
Chet Ramey
2018-09-25 13:46:55 UTC
Permalink
Post by dirk+
FYI: I ended up checking with type before whether an external printf
exists and set a variable for this and then just call this variable.
env or exec: never thought about it (thanks!) but as both are external
commands, that would mean upon every call one additional external program.
(yes, I know that there is such thing as a fs buffer). Subshells also costs
resources. As this is a core function I am happy for every homeopathic dose
of time I safe here :-)
`exec' is a shell builtin. It will `cost' in terms of a fork, but you're
going to fork and exec a different program anyway -- /usr/bin/printf --
so it's basically a wash. In either case, there's one fork and one
execve.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
dirk+
2018-09-25 14:13:02 UTC
Permalink
Post by Chet Ramey
Post by dirk+
FYI: I ended up checking with type before whether an external printf
exists and set a variable for this and then just call this variable.
env or exec: never thought about it (thanks!) but as both are external
commands, that would mean upon every call one additional external program.
(yes, I know that there is such thing as a fs buffer). Subshells also costs
resources. As this is a core function I am happy for every homeopathic dose
of time I safe here :-)
`exec' is a shell builtin. It will `cost' in terms of a fork, but you're
going to fork and exec a different program anyway -- /usr/bin/printf --
so it's basically a wash. In either case, there's one fork and one
execve.
yeah, you're right.
Greg Wooledge
2018-09-25 13:58:16 UTC
Permalink
Post by dirk+
env or exec: never thought about it (thanks!) but as both are external
commands, that would mean upon every call one additional external program.
(yes, I know that there is such thing as a fs buffer). Subshells also costs
resources. As this is a core function I am happy for every homeopathic dose
of time I safe here :-)
You could also disable the builtin printf with "enable -n printf".
Robert Elz
2018-09-23 01:55:11 UTC
Permalink
Date: Sat, 22 Sep 2018 14:22:19 -0600
From: Bob Proulx <***@proulx.com>
Message-ID: <***@bob.proulx.com>


| Primarily a shell script is a command and control program. It is very
| good for that purpose. It is typically used for that purpose. That
| is the mainstream use and it is very unlikely one will run into
| unusual situations there.
|
| But programming tasks that are much different from command and control
| tasks, such as

I completely agree with all of that - shells should be used for running other
programs, which do the actual work, and only in the most simplistic of cases
actually doing stuff themselves.

This kind of explosion is what destroyed perl as a useful tool - as originally
created by Larry Wall it was very useful, combining the power of string
manipulation with regular expressions (for which sed would have been used
previously) with the field splitting a floating point calculations from awk,
and the i/o and program cotrol normally found in a shell.

Then it was set upon by morons who seem to believe that it must be possible
to write evenything in whatever is their favourits programming language, and
added networking ability (including the ability to format packets of course),
and others added threading, and others object orientation, and now all that's
left is a giant mess.

Every attempt should be made to resist shells moving in the same direction.


| > I doubt you can judge on this by just looking at a single line
| > of code -- the project has > 18k LoC in bash.

That in itself tells me it is probably misguided. While I can imagine
tasks that would need that much shell code, they are very very very
rare (and most often using auto-generated, and highly repititious, code.)

kre

ps: f there was actually a desire to use dd to do re-buffering, the
correct usage is not to use "bs=" (which simply does read write with
a buffer that size) but "obs=" which reads (using ibs if needed, which it
would not be here), copies to an output buffer and writes only when that
buffer is full (or on EOF on the input).
Bob Proulx
2018-09-23 05:51:08 UTC
Permalink
Post by Robert Elz
ps: f there was actually a desire to use dd to do re-buffering, the
correct usage is not to use "bs=" (which simply does read write with
a buffer that size) but "obs=" which reads (using ibs if needed, which it
would not be here), copies to an output buffer and writes only when that
buffer is full (or on EOF on the input).
If the goal is to minimize writes then it won't matter as long as the
buffer size picked is larger than needed. Using the same buffer size
for input and output is usually most efficient.

First let's set it to something small to prove that it is buffering as
expected.

$ printf -- "%s\n" one two | strace -o /tmp/out -e read,write dd status=none bs=2 ; cat /tmp/out
one
two
...
read(0, "on", 2) = 2
write(1, "on", 2) = 2
read(0, "e\n", 2) = 2
write(1, "e\n", 2) = 2
read(0, "tw", 2) = 2
write(1, "tw", 2) = 2
read(0, "o\n", 2) = 2
write(1, "o\n", 2) = 2
read(0, "", 2) = 0
+++ exited with 0 +++

Lots of reads and writes but all as expected.

Or set just the output buffer size large. Then the input buffer size
defaults to 512 bytes on my system.

$ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd status=none obs=1M ; cat /tmp/out
one
two
...
read(0, "one\ntwo\n", 512) = 8
read(0, "", 512) = 0
write(1, "one\ntwo\n", 8) = 8
+++ exited with 0 +++

But even if ibs is much too small it still behaves okay with a small
input buffer size and a large output buffer size.

$ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd status=none ibs=2 obs=1M ; cat /tmp/out
one
two
...
read(0, "on", 2) = 2
read(0, "e\n", 2) = 2
read(0, "tw", 2) = 2
read(0, "o\n", 2) = 2
read(0, "", 2) = 0
write(1, "one\ntwo\n", 8) = 8
+++ exited with 0 +++

Then set both ibs and obs to be something quite large using bs= and
let it gather up all of the input and write with that buffer size.

$ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd status=none bs=1M ; cat /tmp/out
one
two
...
read(0, "one\ntwo\n", 1048576) = 8
write(1, "one\ntwo\n", 8) = 8
read(0, "", 1048576) = 0
+++ exited with 0 +++

It seems to me that using a large buffer size for both read and write
would be the most efficient. It can then use the same buffer that
data was read into for the output buffer directly.

Bob
Robert Elz
2018-09-23 09:20:02 UTC
Permalink
Date: Sat, 22 Sep 2018 23:51:08 -0600
From: Bob Proulx <***@proulx.com>
Message-ID: <***@bob.proulx.com>

| Using the same buffer size
| for input and output is usually most efficient.

Yes, but as the objective seemed to be to make big packets, that is probably
not as important.

| $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd status=none obs=1M ; cat /tmp/out
| one
| two
| ...
| read(0, "one\ntwo\n", 512) = 8

What is relevant there is that you're getrting both lines from the printf in
one read. If that had happened, there would ne no need for any rebuffering.
The point of the original complaint was that that was not ahppening, and
the reads were being broken at the \n ... here it might easily make a
difference whether the output is a pipe or a socket (I have no idea.)

| But even if ibs is much too small it still behaves okay with a small
| input buffer size and a large output buffer size.

Yes, with separate buffers, that's how dd works (has always worked).
That is why using it that way could solve the problem.

| It seems to me that using a large buffer size for both read and write
| would be the most efficient.

Yes.

| It can then use the same buffer that data was read into for the output
| buffer directly.

No, it can't, that's what bs= does - you're right, that is most effecient,
but there is no rebuffering, whatever is read, is written, and in that case
even more effecient is not to interpose dd at all. The whole point was
to get the rebuffering.

Try tests more like

{ printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd ....

so there will be clearly 2 different writes, and small reads for dd
(however big the input buffer has) - with obs= (somethingbig enough)
there will be just 1 write, with bs= (anything big enough for the whole
output) there will still be two writes.

kre

ps: this is not really the correct place to discuss dd.
Bob Proulx
2018-09-23 18:29:14 UTC
Permalink
Post by Robert Elz
| Using the same buffer size
| for input and output is usually most efficient.
Yes, but as the objective seemed to be to make big packets, that is probably
not as important.
The original complaint concerned flushing a data blob content upon
every newline (0x0a) character due to line buffering, write(2)'ing the
buffer up to that point. As I am sure you already know that will
cause the network stack in the kernel to emit the buffered data up to
that point with whatever has been read up to that point. Which was
apparently a small'ish amount of data. And then instead of having
some number of full MTU sized packets there were many more smaller
ones. It shouldn't have been about big packets, nor fragmentation,
but about streaming efficiency and performance. Though achieving
correct behavior with more buffer flushes than desired this was
apparently less efficient than they wanted and were therefore
complaining about it. They wanted the data blob buffered as much as
possible so as to use the fewest number of TCP network packets. My
choice of a large one meg buffer size was to be larger than any
network MTU size. My intention was that the network stack would then
split the data blob up into MTU sizes for transmission. The largest
MTU size that I routinely see is 64k. I expect that to increase
further in size in the future when 1 meg might not be big enough. And
I avoid mentioning jumbo frames.
Post by Robert Elz
| $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd status=none obs=1M ; cat /tmp/out
| one
| two
| ...
| read(0, "one\ntwo\n", 512) = 8
What is relevant there is that you're getrting both lines from the printf in
one read. If that had happened, there would ne no need for any rebuffering.
The point of the original complaint was that that was not ahppening, and
the reads were being broken at the \n ... here it might easily make a
difference whether the output is a pipe or a socket (I have no idea.)
I dug into this further and see that we were both right. :-)

I was getting misdirected by the Linux kernel's pipeline buffering.
The pipeline buffering was causing me to think that it did not matter.
But digging deeper I see that it was a race condition timing issue and
could go either way. That's obviously a mistake on my part.

You are right that depending upon timing this must be handled properly
or it might fail. I am wrong that it would always work regardless of
timing. However it was working in my test case which is why I had not
noticed. Thank you for pushing me to see the problem here.
Post by Robert Elz
| It can then use the same buffer that data was read into for the output
| buffer directly.
No, it can't, that's what bs= does - you're right, that is most effecient,
but there is no rebuffering, whatever is read, is written, and in that case
even more effecient is not to interpose dd at all. The whole point was
to get the rebuffering.
Try tests more like
{ printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd ....
so there will be clearly 2 different writes, and small reads for dd
(however big the input buffer has) - with obs= (somethingbig enough)
there will be just 1 write, with bs= (anything big enough for the whole
output) there will still be two writes.
$ { command printf "one\n"; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out
one
two
...
read(0, "one\ntwo\n", 1048576) = 8
write(1, "one\ntwo\n", 8) = 8
read(0, "", 1048576) = 0
+++ exited with 0 +++

Above the data is definitely written in two different processes but
due to Linux kernel buffering in the pipeline it is read in one read.
The data is written into the pipeline so quickly, before the next
stage of the pipeline could read it out, that by the time the read
eventually happened it was able to read the multiple writes as one
data block. This is what I had been seeing but you are right that it
is a timing related success and could also be a timing related
failure.

$ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out
one
two
...
read(0, "one\n", 1048576) = 4
write(1, "one\n", 4) = 4
read(0, "two\n", 1048576) = 4
write(1, "two\n", 4) = 4
read(0, "", 1048576) = 0
+++ exited with 0 +++

The above illustrates the point you were trying to make. Thank you
for persevering in educating me as to the issue. :-)

$ { command printf "one\n"; sleep 1; command printf "two\n" ;} | { sleep 2; strace -v -o /tmp/dd.strace.out -e write,read dd status=none bs=1M ; head /tmp/*.strace.out ;}
one
two
...
read(0, "one\ntwo\n", 1048576) = 8
write(1, "one\ntwo\n", 8) = 8
read(0, "", 1048576) = 0
+++ exited with 0 +++

The above is just me showing that it is definitely a race condition
problem that can go either way. But obviously race conditions are
timing bugs and should never be counted upon always working one way or
the other. Just showing why I got sucked into it. :-(

$ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none obs=1M ; head /tmp/*.strace.out
one
two
...
read(0, "one\n", 512) = 4
read(0, "two\n", 512) = 4
read(0, "", 512) = 0
write(1, "one\ntwo\n", 8) = 8
+++ exited with 0 +++

And the above using a large output block size, as you suggest, shows
the solution where dd is re-blocking the output.

$ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none ibs=1M obs=1M ; head /tmp/*.strace.out
one
two
...
read(0, "one\n", 1048576) = 4
read(0, "two\n", 1048576) = 4
read(0, "", 1048576) = 0
write(1, "one\ntwo\n", 8) = 8
+++ exited with 0 +++

And just for completeness I will show the above with both a large
input buffer and a large output buffer of the same size and show that
result too. The required dd option, as you correctly insisted, really
is obs= in order to set the output block size. I stand corrected. :-)

I had missed the documented dd behavior:

‘bs=BYTES’
Set both input and output block sizes to BYTES. This makes ‘dd’
read and write BYTES per block, overriding any ‘ibs’ and ‘obs’
settings. In addition, if no data-transforming ‘conv’ option is
specified, input is copied to the output as soon as it’s read, even
if it is smaller than the block size.

It is always good to learn something new about fundamental behavior in
a command one has been using for some decades! :-)
Post by Robert Elz
ps: this is not really the correct place to discuss dd.
The help-bash list would be better generally for random shell stuff
but the discussion started here in this bug thread and this part of
the discussion is topical to the solution for it. This is the right
place for this.

Bob
Dirk Wetter
2018-10-11 16:53:18 UTC
Permalink
Post by Bob Proulx
$ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace -v -o /tmp/dd.strace.out -e write,read dd status=none ibs=1M obs=1M ; head /tmp/*.strace.out
one
two
...
read(0, "one\n", 1048576) = 4
read(0, "two\n", 1048576) = 4
read(0, "", 1048576) = 0
write(1, "one\ntwo\n", 8) = 8
+++ exited with 0 +++
And just for completeness I will show the above with both a large
input buffer and a large output buffer of the same size and show that
result too. The required dd option, as you correctly insisted, really
is obs= in order to set the output block size. I stand corrected. :-)
‘bs=BYTES’
Set both input and output block sizes to BYTES. This makes ‘dd’
read and write BYTES per block, overriding any ‘ibs’ and ‘obs’
settings. In addition, if no data-transforming ‘conv’ option is
specified, input is copied to the output as soon as it’s read, even
if it is smaller than the block size.
It is always good to learn something new about fundamental behavior in
a command one has been using for some decades! :-)
Thanks for the long mails!

This all -- including cat -- sounded reasonable. But it seems using sockets the internal printf
as opposed to the one from coreutils is still causing fragmentation other than expected with
strace PoC:

bash 0$ exec 5<>/dev/tcp/81.169.199.25/443
bash 0$ printf
'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0b\x85\x03\x90\x9f\x77\x04\x33\xd4\xde\x20\x44\xb8\x92\x56\xaf\x74\x52\x9e\xd8\xcf\x52\x14\xc8\xaf\xd8\x34\x0b\xe7\x7f\xeb\x86\x01\x84\x50\x5d\xe4\xa1\x6a\x09\x3b\xbf\x6e\x00\x0e\x13\x01\x13\x02\x13\x03\x13\x04\x13\x05\xc0\x30\x00\xff\x01\x00\x01\xa5\x00\x00\x00\x0b\x00\x09\x00\x00\x06\x66\x66\x66\x66\x66\x66\x00\x2b\x00\x17\x16\x03\x04\x7f\x1c\x7f\x1b\x7f\x1a\x7f\x19\x7f\x18\x7f\x17\x03\x03\x03\x02\x03\x01\x03\x00\x00\x23\x00\x00\x33\x74\x00\x00\x00\x0d\x00\x22\x00\x20\x04\x03\x05\x03\x06\x03\x08\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06\x01\x08\x09\x08\x0a\x08\x0b\x08\x07\x08\x08\x02\x01\x02\x03\x00\x0a\x00\x10\x00\x0e\x00\x1d\x00\x17\x00\x1e\x00\x18\x00\x19\x01\x00\x01\x01\x00\x33\x00\x6b\x00\x69\x00\x1d\x00\x20\x4d\xfa\x57\x44\xb7\xf7\x48\xb8\x95\x77\x5a\xc1\xff\x86\xbf\xae\xf7\x3a\x33\x69\x54\xde\x6a\xf5\x2e\x89\x84\x6c\xf2\xd8\xb2\x43\x00\x17\x00\x41\x04\xb4\x24\xef\x11\x99\x9c\xa4\xe8\xce\x88\x25\xc3\x8e\x7c\x0c\x6a\x94\xde\x33\x6d\xff\xcd\x17\xb7\x5c\x65\xdb\xd1\x58\x46\x95\x69\x80\xc8\xbc\xfc\xe6\xd9\x22\x39\xbb\x3f\x63\xab\x3d\x5c\xba\xcc\xeb\x1a\x90\x1b\xd4\x75\xff\x58\xc4\x00\x58\x50\x21\xd0\xaa\xe4\x00\x0b\x00\x02\x01\x00\x00\x0f\x00\x01\x01\x00\x15\x00\xbb\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0^Cx00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
| dd obs=1M ibs=1M >&5
bash 0$

(Excuse the wrapping. The IP is mine from the project. Feel free to use another IP.
The servername encoded in there is anyway nonsense)

If you use wireshark you see in the ClientHello "TCP segment of a reassembled PDU" @ byte
173. That's where the first LF is encountered. The second one doesn't cause an additional
fragment here, other people spotted that.

The fragmentation is independent on the dd options used. Also "| cat" does the same.
stdbuf is not available on all platforms, especially on those who do not have a similar
external printf:

/usr/bin/printf "\xf5\xee\xbe\xe5" | xxd
00000000: 7866 3578 6565 7862 6578 6535 xf5xeexbexe5

like FreeBSD and OS X. OpenBSD's /usr/bin/printf works surprisingly.


Cheers, Dirk


PS + @Bob: fd 5 is not a tty in the program -- but interactively in this PoC you w
Dirk Wetter
2018-10-11 17:28:37 UTC
Permalink
Post by Dirk Wetter
bash 0$ exec 5<>/dev/tcp/81.169.199.25/443
bash 0$ printf
'\x16\x03\x01\x02\x00\x01\x00\x01\xfc\x03\x03\x54\x51\x1e\x7a\xde\xad\xbe\xef\x31\x33\x07\x00\x00\x00\x00\x00\xcf\xbd\x39\x04\xcc\x16\x0b\x85\x03\x90\x9f\x77\x04\x33\xd4\xde\x20\x44\xb8\x92\x56\xaf\x74\x52\x9e\xd8\xcf\x52\x14\xc8\xaf\xd8\x34\x0b\xe7\x7f\xeb\x86\x01\x84\x50\x5d\xe4\xa1\x6a\x09\x3b\xbf\x6e\x00\x0e\x13\x01\x13\x02\x13\x03\x13\x04\x13\x05\xc0\x30\x00\xff\x01\x00\x01\xa5\x00\x00\x00\x0b\x00\x09\x00\x00\x06\x66\x66\x66\x66\x66\x66\x00\x2b\x00\x17\x16\x03\x04\x7f\x1c\x7f\x1b\x7f\x1a\x7f\x19\x7f\x18\x7f\x17\x03\x03\x03\x02\x03\x01\x03\x00\x00\x23\x00\x00\x33\x74\x00\x00\x00\x0d\x00\x22\x00\x20\x04\x03\x05\x03\x06\x03\x08\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06\x01\x08\x09\x08\x0a\x08\x0b\x08\x07\x08\x08\x02\x01\x02\x03\x00\x0a\x00\x10\x00\x0e\x00\x1d\x00\x17\x00\x1e\x00\x18\x00\x19\x01\x00\x01\x01\x00\x33\x00\x6b\x00\x69\x00\x1d\x00\x20\x4d\xfa\x57\x44\xb7\xf7\x48\xb8\x95\x77\x5a\xc1\xff\x86\xbf\xae\xf7\x3a\x33\x69\x54\xde\x6a\xf5\x2e\x89\x84\x6c\xf2\xd8\xb2\x43\x00\x17\x00\x41\x04\xb4\x24\xef\x11\x99\x9c\xa4\xe8\xce\x88\x25\xc3\x8e\x7c\x0c\x6a\x94\xde\x33\x6d\xff\xcd\x17\xb7\x5c\x65\xdb\xd1\x58\x46\x95\x69\x80\xc8\xbc\xfc\xe6\xd9\x22\x39\xbb\x3f\x63\xab\x3d\x5c\xba\xcc\xeb\x1a\x90\x1b\xd4\x75\xff\x58\xc4\x00\x58\x50\x21\xd0\xaa\xe4\x00\x0b\x00\x02\x01\x00\x00\x0f\x00\x01\x01\x00\x15\x00\xbb\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0^Cx00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
| dd obs=1M ibs=1M >&5
bash 0$
Sorry, forget about my mail. In the actual file I was using there was a variable which I
haven't quoted correctly. The PoC works as expected and it looks
Bob Proulx
2018-10-11 21:35:55 UTC
Permalink
this PoC you want to make sure it is not taken yet.
Understood. I originally mentioned the isatty() thing because that is
how libc decides to line buffer or not. But Chet has confirmed that
bash's internal printf line buffers regardless, because it has always
done it that way since the beginning. So for the moment at least
whether it is a tty or not doesn't matter to the bash printf. Since
this is a lessor traveled corner of bash it hasn't been tripped over
much before. And probably never discussed here. However I think it
likely (I didn't check) that it does for the coreutils printf using
libc's default stdio buffering. So it is good to know about
regardless.

I think it would make sense if bash's internal commands like printf
performed buffering the same way as the default libc does. Then it
would behave the same as the rest and be less likely to cause
problems. And it would have meant it would have worked by default in
your case.
Sorry, forget about my mail. In the actual file I was using there
was a variable which I haven't quoted correctly. The PoC works as
expected and it looks I have a solution.
The smallest of details can break the largest of programs. :-)

Good to hear you have things resolved and now working for you!

Bob
Dirk Wetter
2018-10-12 09:10:53 UTC
Permalink
Post by Bob Proulx
The smallest of details can break the largest of programs. :-)
Good to hear you have things resolved and now working for you!
Thanks all for your insights and help!

Cheers, Dirk

Chet Ramey
2018-10-11 19:11:53 UTC
Permalink
Post by Dirk Wetter
This all -- including cat -- sounded reasonable. But it seems using sockets the internal printf
as opposed to the one from coreutils is still causing fragmentation other than expected with
Bash line-buffers stdout, so newlines cause writes and packet
fragmentation.

http://lists.gnu.org/archive/html/bug-bash/2018-09/msg00086.html
http://lists.gnu.org/archive/html/bug-bash/2018-09/msg00095.html
http://lists.gnu.org/archive/html/bug-bash/2018-09/msg00102.html
http://lists.gnu.org/archive/html/bug-bash/2018-09/msg00118.html

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Greg Wooledge
2018-09-24 13:05:33 UTC
Permalink
Post by dirk+
Post by Bob Proulx
Post by dirk+
printf -- "$data" >&5 2>/dev/null
What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
Looking ONLY at this one line, there is an obvious bug, which Bob has
pointed out. It should be

printf %s "$data" >&5 2>/dev/null
Post by dirk+
I doubt you can judge on this by just looking at a single line
of code -- the project has > 18k LoC in bash.
That is utterly horrifying.

I have no comments on the buffering issues. Those have already been
covered.
L A Walsh
2018-09-25 12:15:02 UTC
Permalink
Post by Greg Wooledge
Post by Bob Proulx
Post by dirk+
printf -- "$data" >&5 2>/dev/null
What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
Looking ONLY at this one line, there is an obvious bug, which Bob has
pointed out. It should be
printf %s "$data" >&5 2>/dev/null
----
This brings to mind a consideration:
As %s says to print a string of data (presumably not
including a NUL byte), then what happens if "$data" is
a paragraph of text with embedded newlines. In that case,
it sounds like bash might break apart the single printf
output into smaller packets rather than transmitting the
entirety of "$data" in 1 write (presuming it is less than
the maximum data size for a network packet).

Also, if you want to flush the data out at the end, it seems
"%s\n" would be required to force out the last line of text if
it wasn't nl terminated.
Post by Greg Wooledge
That is utterly horrifying.
---
Hmmm....I didn't realize how sensitive some sensibilities were...
:-)
Greg Wooledge
2018-09-25 12:25:41 UTC
Permalink
Post by L A Walsh
Post by Greg Wooledge
Post by Bob Proulx
Post by dirk+
printf -- "$data" >&5 2>/dev/null
What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
Looking ONLY at this one line, there is an obvious bug, which Bob has
pointed out. It should be
printf %s "$data" >&5 2>/dev/null
----
As %s says to print a string of data (presumably not
including a NUL byte), then what happens if "$data" is
a paragraph of text with embedded newlines. In that case,
it sounds like bash might break apart the single printf
output into smaller packets rather than transmitting the
entirety of "$data" in 1 write (presuming it is less than
the maximum data size for a network packet).
Yes, I'm sure it does. In fact, bash's printf and echo builtins are
already known to use multiple calls to write() even when sockets and
newlines are not involved.

For example (from <https://mywiki.wooledge.org/BashPitfalls#pf51>):

$ perl -e 'print "a"x2000, "\n"' > foo
$ strace -e write bash -c 'read -r foo < foo; echo "$foo"' >/dev/null
write(1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 1008) = 1008
write(1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 993) = 993
+++ exited with 0 +++

There is no guarantee that the entire payload will be sent with a single
write() command. If you need that kind of low-level control, bash is
not the right language for this project.
Chet Ramey
2018-09-25 13:48:28 UTC
Permalink
Post by Greg Wooledge
Post by L A Walsh
Post by Greg Wooledge
Post by Bob Proulx
Post by dirk+
printf -- "$data" >&5 2>/dev/null
What happens if $data contains % format strings? What happens if the
format contains a sequence such as \c? This looks problematic. This
is not a safe programming proctice.
Looking ONLY at this one line, there is an obvious bug, which Bob has
pointed out. It should be
printf %s "$data" >&5 2>/dev/null
----
As %s says to print a string of data (presumably not
including a NUL byte), then what happens if "$data" is
a paragraph of text with embedded newlines. In that case,
it sounds like bash might break apart the single printf
output into smaller packets rather than transmitting the
entirety of "$data" in 1 write (presuming it is less than
the maximum data size for a network packet).
Yes, I'm sure it does. In fact, bash's printf and echo builtins are
already known to use multiple calls to write() even when sockets and
newlines are not involved.
Yes, bash does line buffering and provides a buffer for stdout and
stderr. This has been noted here previously and isn't specific to
printf or echo.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Dirk Wetter
2018-09-26 06:17:39 UTC
Permalink
Post by L A Walsh
Post by Greg Wooledge
 
   
     
Post by dirk+
printf -- "$data" >&5 2>/dev/null
       
What happens if $data contains % format strings?  What happens if the
format contains a sequence such as \c?  This looks problematic.  This
is not a safe programming proctice.
     
Looking ONLY at this one line, there is an obvious bug, which Bob has
pointed out.  It should be
printf %s "$data" >&5 2>/dev/null
 
----
As %s says to print a string of data (presumably not
including a NUL byte),
it certainly does contain a null byte, and every other chars
between 1-255. That's the point of a network socket.

Also "$data" will NEVER contain user input in any way
with one exception being the hostname which is transferred
via hexdump into exactly this format.

Other than that "$data" is populated purely internally. It can't
contain anything else between '\x00' and '\xff' unless there's a
coding error which could be a good idea to catch before
and not here.

This is why I said you can't look only at one line of
code.

Code reviews requires to see the whole picture.

BTW: printf seems to be off the table. BSDish /usr/bin/printf
is completely different compared to the the coreutils incarnation.
OpenBSD has per default not even a printf outside bash.
Post by L A Walsh
then what happens if "$data" is
a paragraph of text with embedded newlines.  In that case,
it sounds like bash might break apart the single printf
output into smaller packets rather than transmitting the
entirety of "$data" in 1 write (presuming it is less than
the maximum data size for a network packet).
yup.

Wonder why the coreutils printf behaves (in my sense) better
than the bash-builtin.
Post by L A Walsh
   Also, if you want to flush the data out at the end, it seems
"%s\n" would be required to force out the last line of text if
it wasn't nl terminated.
Post by Greg Wooledge
That is utterly horrifying.
 
I take that as a compliment :-)
Post by L A Walsh
---
   Hmmm....I didn't realize how sensitive some sensibilities were...
:-)
LOL

There are JavaScript frameworks in the browser of similar size or
even bigger, the kernel I am using right now is written in a language
which is not known to be safe and whose parser after 25 years throws
sometimes utterly misleading errors -- which still remind me on the
first K+R c compilers -- this and other things I found rather horrifying.

Script languages have long evolved -- you should take this really as a compliment --
and as I started this project I never thought it would boldly go there :-)

Cheers, Dirk
Chet Ramey
2018-09-26 14:00:57 UTC
Permalink
Post by Dirk Wetter
Post by L A Walsh
then what happens if "$data" is
a paragraph of text with embedded newlines.  In that case,
it sounds like bash might break apart the single printf
output into smaller packets rather than transmitting the
entirety of "$data" in 1 write (presuming it is less than
the maximum data size for a network packet).
yup.
Wonder why the coreutils printf behaves (in my sense) better
than the bash-builtin.
The answer's the same as it was last week: because bash line-buffers
stdout and stderr, which it has done since early 1992.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU ***@case.edu http://tiswww.cwru.edu/~chet/
Loading...