Discussion:
Long Running SSH Tunnel, Slowing Down
(too old to reply)
Timothy O'Keefe
2010-03-22 17:32:52 UTC
Permalink
Hi all,

First time poster, long time OpenSSH user :)

The Situation:

Users within our net require access to a website (http/80) that is
being hosted on another, trusted net. Admins on this other trusted net
are not necessarily as trusting as we are, though they do provide a
ssh gateway. So, one fairly easy solution that was decided upon was to
simply allow users access to this website via a "permanent" SSH tunnel
(-f)

ssh -Nf -L 9100:webserver.trusted.com:80 ***@sshgw.trusted.com

The Problem:

After a while -- say a few days/weeks -- of having this tunnel
established, transactions through this tunnel slow down to a crawl. To
the point where requests will typically timeout. Establishing a brand
new tunnel alongside the slowing tunnel seems to work fine. I don't
see anything particularly wrong with the endpoint systems other than
that sshd on the ssh gateway is consuming about 1.4 MB of virtual
memory. While this does not pose any threat to the machine per se, it
does seem a bit strange to me.

I'm curious as to what might be happening here, and what -- if
anything -- we can do about it. I've heard from a number of folks that
ssh tunnels for this purpose are a "bad idea" and that we might
consider a connectionless OpenVPN based solution. This is 100% fine,
however no one has been able to explain _why_ the tunnel slows down
which happens to be precisely what interests me. Can someone provide
me with any insights?

The ssh gateway system is CentOS 4.7 w/ OpenSSH 3.9p1 and the client
is Ubuntu 8.04 w/ OpenSSH 4.7p1.

Thanks in advance,
Tim
John Morrison
2010-03-23 09:44:22 UTC
Permalink
Tim,

I'm not an expert on ssh. IMHO this kind of behaviour is typical of a
memory leak or the application running out of resources in some way.
So this may be a track worth pursuing before having to delve in to the
more complex world of decoding packets. As you can establish a second
tunnel alongside the slow tunnel and this works it is not an OS
resource issue. It may be that instance of the ssh application (either
your client or the instance of the daemon to which you connect) has
run out of resources.

My first reaction would be to update the software at each end to the
latest revision. Second have a look for this specific issue with the
software being used. It may be a known issue and that a specific
parameter setting may resolve this.

If this makes no difference then try the more difficult decoding
route. I suspect that the software has its own commands/tools to help
with this. For example, Cisco has "debug" commands to help with
troubleshooting, such as debug ip ssh
(http://www.cisco.com/en/US/tech/tk583/tk617/technologies_tech_note09186a00800949e2.shtml#debugandshowcommands).
You may spot increasing number of errors as the link is used, or at
least be able to work out which packets belong to which type of ssh
traffic.
Post by Timothy O'Keefe
Hi all,
First time poster, long time OpenSSH user :)
Users within our net require access to a website (http/80) that is
being hosted on another, trusted net. Admins on this other trusted net
are not necessarily as trusting as we are, though they do provide a
ssh gateway. So, one fairly easy solution that was decided upon was to
simply allow users access to this website via a "permanent" SSH tunnel
(-f)
After a while -- say a few days/weeks -- of having this tunnel
established, transactions through this tunnel slow down to a crawl. To
the point where requests will typically timeout. Establishing a brand
new tunnel alongside the slowing tunnel seems to work fine. I don't
see anything particularly wrong with the endpoint systems other than
that sshd on the ssh gateway is consuming about 1.4 MB of virtual
memory. While this does not pose any threat to the machine per se, it
does seem a bit strange to me.
I'm curious as to what might be happening here, and what -- if
anything -- we can do about it. I've heard from a number of folks that
ssh tunnels for this purpose are a "bad idea" and that we might
consider a connectionless OpenVPN based solution. This is 100% fine,
however no one has been able to explain _why_ the tunnel slows down
which happens to be precisely what interests me. Can someone provide
me with any insights?
The ssh gateway system is CentOS 4.7 w/ OpenSSH 3.9p1 and the client
is Ubuntu 8.04 w/ OpenSSH 4.7p1.
Thanks in advance,
Tim
Andrew M.A. Cater
2010-03-24 04:16:11 UTC
Permalink
Post by John Morrison
Tim,
I'm not an expert on ssh. IMHO this kind of behaviour is typical of a
memory leak or the application running out of resources in some way.
Memory leak sounds very feasible here but do check what other processes
are running in general. If you leave a process running for days, also
check what _other_ stuff is running around it. (Andy, who has just had
to reboot his home wifi router which is on 24/7 because it dies about
once a week and loses DNS - a reset fixes it).
Post by John Morrison
So this may be a track worth pursuing before having to delve in to the
more complex world of decoding packets. As you can establish a second
tunnel alongside the slow tunnel and this works it is not an OS
resource issue. It may be that instance of the ssh application (either
your client or the instance of the daemon to which you connect) has
run out of resources.
Far down in your reply, you mention ssh using 1.4MiB of virtual memory -
is this the figure from top or some such or do you mean that the machine
is also hitting swap?
Post by John Morrison
My first reaction would be to update the software at each end to the
latest revision. Second have a look for this specific issue with the
software being used. It may be a known issue and that a specific
parameter setting may resolve this.
You have access to your end only to do this, I presume, and I note that
you're using 8.04 which is an LTS release: if it's a server, then it's
still within the ?? five years ?? support.

The next iteration of Ubuntu 10.04 is also a long term supported release
and the beta is out now - the rest is due in April. Try building a test
machine to see whether issues are resolved / there are other factors
which may make you consider an upgrade in the future?
Post by John Morrison
If this makes no difference then try the more difficult decoding
route. I suspect that the software has its own commands/tools to help
with this. For example, Cisco has "debug" commands to help with
troubleshooting, such as debug ip ssh
(http://www.cisco.com/en/US/tech/tk583/tk617/technologies_tech_note09186a00800949e2.shtml#debugandshowcommands).
You may spot increasing number of errors as the link is used, or at
least be able to work out which packets belong to which type of ssh
traffic.
Post by Timothy O'Keefe
After a while -- say a few days/weeks -- of having this tunnel
established, transactions through this tunnel slow down to a crawl. To
the point where requests will typically timeout. Establishing a brand
new tunnel alongside the slowing tunnel seems to work fine. I don't
see anything particularly wrong with the endpoint systems other than
that sshd on the ssh gateway is consuming about 1.4 MB of virtual
memory. While this does not pose any threat to the machine per se, it
does seem a bit strange to me.
The ssh gateway system is CentOS 4.7 w/ OpenSSH 3.9p1 and the client
is Ubuntu 8.04 w/ OpenSSH 4.7p1.
The gateway sysadmin might want to consider CentOS 4.8 as a minimum /
updating from EPEL / RPMForge. OpenSSH 3.9 is desperately old :(
Post by John Morrison
Post by Timothy O'Keefe
Thanks in advance,
Tim
Hope this helps,

AndyC
i***@gmail.com
2010-03-23 22:19:47 UTC
Permalink
Hi Tim

Just sharing how I would investigate(not solve) the problem.

My first step would be to try and sniff at the serverside of the slow old tunnel. Do the same transaction on the new one that you got setup and compare the packet dump. Then I would analyze the diff and make an intelligent judgement. As is it your own network, you can even decrypt data and see the raw stuff.

Revert if in doubt

Cheers,
Deadbrain

Sent on my BlackBerry®
Stephen Cropp
2010-03-24 22:52:48 UTC
Permalink
On Tue, Mar 23, 2010 at 6:32 AM, Timothy O'Keefe
Post by Timothy O'Keefe
Hi all,
After a while -- say a few days/weeks -- of having this tunnel
established, transactions through this tunnel slow down to a crawl. To
the point where requests will typically timeout. Establishing a brand
new tunnel alongside the slowing tunnel seems to work fine. I don't
see anything particularly wrong with the endpoint systems other than
that sshd on the ssh gateway is consuming about 1.4 MB of virtual
memory. While this does not pose any threat to the machine per se, it
does seem a bit strange to me.
The problem is essentially that TCP tunnels over a TCP transport are a
bad idea. Eventually you get a cascading effect that will slow things
down substantially to the point where it becomes essentially useless
and you'll have to rebuild the tunnels.

This is why most VPN and tunnelling solutions work over UDP or their
own IP protocol rather than across TCP.

For practical purposes, the only way to really take care of the issue
is to run scripts that will tear down and recreate the tunnel at set
times. The other alternative is to use a tunnelling method that uses
UDP or some other protocol.
Post by Timothy O'Keefe
I'm curious as to what might be happening here, and what -- if
anything -- we can do about it. I've heard from a number of folks that
ssh tunnels for this purpose are a "bad idea" and that we might
consider a connectionless OpenVPN based solution. This is 100% fine,
however no one has been able to explain _why_ the tunnel slows down
which happens to be precisely what interests me. Can someone provide
me with any insights?
For a good explaination of why this happens and how to resolve it from
a network point of view, you can see the following PDF.

http://docs.google.com/viewer?a=v&q=cache:TqsO7Bi6-1AJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.21.7007%26rep%3Drep1%26type%3Dpdf+TCP+tunnels+over+TCP+networks+performance&hl=en

Hope that helps.
Stephen Cropp
2010-03-24 23:33:10 UTC
Permalink
Post by Stephen Cropp
For a good explaination of why this happens and how to resolve it from
a network point of view, you can see the following PDF.
http://docs.google.com/viewer?a=v&q=cache:TqsO7Bi6-1AJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.21.7007%26rep%3Drep1%26type%3Dpdf+TCP+tunnels+over+TCP+networks+performance&hl=en
Hope that helps.
Ugggh... Just realised I sent through the wrong article.

A far more useful link regarding TCP over TCP performance is available at
http://sites.inka.de/~W1011/devel/tcp-tcp.html

Sorry for the misdirection.
Timothy O'Keefe
2010-03-25 03:04:57 UTC
Permalink
Thanks so far to everyone who has responded to my question. At a first
timer on this mailing list, I wasn't entirely sure what to expect.
This is amazingly helpful.

Stephen, thank you for that very helpful and interesting article.
Post by Stephen Cropp
Post by Stephen Cropp
For a good explaination of why this happens and how to resolve it from
a network point of view, you can see the following PDF.
http://docs.google.com/viewer?a=v&q=cache:TqsO7Bi6-1AJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.21.7007%26rep%3Drep1%26type%3Dpdf+TCP+tunnels+over+TCP+networks+performance&hl=en
Hope that helps.
Ugggh... Just realised I sent through the wrong article.
A far more useful link regarding TCP over TCP performance is available at
http://sites.inka.de/~W1011/devel/tcp-tcp.html
Sorry for the misdirection.
Loading...