SSH Jump servers are nothing new. They’ve been around a long time. The very first implementations were simply an outmoded server, with an SSH server which allowed the user to create a new SSH session to any number of servers on the LAN. This limited public-private connection security concerns to a narrow arrow, just the jump server. All other servers would be non-accessible from outside networks (including the internet when it came about) This is a very primitive method, and though it can still be found in some non-critical networks, it’s by far ideal and no where near best practices.
The very basic premise of an SSH jump server is to allow users and admins to have a single point of connection which can then be used to connect to privileged servers and services. The more basic method this is achieved, the less secure the whole system can be. The one main fail in the original design, is that once an intruder has access to the jump server, they potentially have access to the entire network. This is a conversation that can delve very deeply into inter-host security, firewalls, application level security, etc.
Any time a person designs, implements, modifies or (re)creates a network, network security scheme, system, or connection scheme, that person is acting in the role of a systems and network engineer. Acting as an engineer and being a qualified, certified engineer are two different things. Not all engineers are proficient, qualified or at all good at the tasks. With that said, I will assume that any person performing these tasks is an engineer – either qualified or not.
It is up to the engineer to keep in mind many aspects of their task, in this case, designing and implementing an SSH jump server. Those aspects can be boiled down to: security, performance, availability, efficiency, and overall connectivity. These aspects of systems and networks apply to every system, network, application, etc.
Security should always be in mind, if not the one thing every engineer should be primarily concerned about. The most basic form of network security is the firewall, and the most basic setup is to “block and poke” where every protocol, port and service is blocked, and holes poked in the firewall for specific needs. This is what anyone with any advanced knowledge of firewalls should know. However, this is only the first step in security, the first in many. I have a blog post concerning SSH and firewall setup here. With an SSH jump server, there’s more security concerns to deal with, this is especially true with multiple users. Much of the security for the jump server itself is the same as covered in the above blog post. The hosts and services which are available to access from the jump server should also be secured. Some of these services may be accessible, but any access from the jump server to that service may be undesirable. The best way to handle that is with the firewall controlling access to that service, be it a physical appliance, or a software firewall (I still prefer and recommend UFW for software firewalls)
The jump server needs to be readily available. This means it needs to be running at the times it will ever be required. Alternatively, some newer methods are available (and won’t be covered here) which relies on a knock connection to start up the service, or the whole system, such as a docker container, etc. Different levels of systems require different levels of availability. In a high availability, mission-critical production environment where admins may need to resolve issues as quickly as possible, the availability should be “always on” but there are use cases where waiting a minute or so for the connection (such as a network lab server) is worth the trade off for saving some money on power and cooling. This goes hand in hand with performance. High level systems will require high level availability and performance for the jump server, or any other service.
By performance, in this case, would be indicative of bandwidth, latency, CPU & RAM resource allocation, and the ability to have a fast, quick, stable connection with as many concurrent users as may ever be required. An SSH server uses very minimal resources once the connection is created. However, two considerations: connection creative with a high amount of encryption will spike CPU resource usage; any data transfer through the SSH connection (tunnel) has the potential to increase CPU and RAM usage, which is especially true for tunneling non-trivial data such as video, game connection, even http proxying.
Balancing performance and availability both, is efficiency. On the most basic of approaches, one can ask “Do I really need a $50k Dell server with 20TBs of storage, 512GBs of RAM and 4 Threadripper CPUs for my 8 person SSH jump server?” The answer to that specific question is most likely “no” – however, if those 8 people are using the machine for 8k video transfers through an SSH tunnel, with each user having multiple connections through the tunnel – it’s quite possible that such a server may be required (or one not quite so overkill) It would also depend if you’re collecting traffic logs, and at what detail and granularity you’re capturing them, how long you want to keep them, etc. For simplicity’s sake, let’s assume we’re going to have a potential of 4 concurrent user connections (regardless of how many actual /people/ have made these connections) and that these connections are for server administrative use only – so only commands, text editing, and the occasional large quantity trivial data paste through the connection (pasting config files to the host) If this is the only things the jump server is handling, then our resources could be down to a Pentium III 800MHz CPU, 128 MBs of RAM (yes, megabytes), an 8GB system drive (we’re not logging anything other than connection attempts, and storing for 30-90 days). So maybe it would be acceptable to have a containerized system on that $50k Dell server dedicated to an SSH jump box, with the other 99.5% of the servers’ resources being assigned to other tasks.
As for network connections, this system could function properly with minimal WAN and LAN connections. The WAN could be as little as 56k dialup, though this will affect data pastes, it is still acceptable for running commands and even text editing. More realistically, any modern jump box would be connected to via at least a 10/1 connection (10mbps down, 1mbps up – from the server’s perspective, giving the user 1mbps of bandwidth to paste config files to the server) These bandwidths are more than adequate for these tasks with this number of users. Where the LAN side is concerned, again, modern networks should be operating at 100base-T or faster, but let’s assume you /are/ using an old PIII server, it may be limited to 10base-T, and that’s still going to be adequate for this type of use. There’s no reason to redesign or limit your 10GbE network for efficiency – in actuality, that would be less so. (If you do find yourself in a situation where your host is capable of only a fraction of your network’s bandwidth, do yourself a favor and add a network switch between the host and network running at the full network speeds, it will limit speed reductions imposed on the rest of the network. A high quality main switch should also do the same. A modern SSH jump server should be compelled to allow for a reasonable number of users and connections, with reasonable steps taken for increased throughput capacity while maintaining low latency connections. Remember, the user will be connecting through at least 2 network segments. The time from key-stroke to on-screen change, the data has to go from the user’s terminal, through the jump box, to the end host, where the change is registered, and that change is sent from the host back through the jump box, down to the user’s terminal. The connection between user terminal and jump box is not always something the engineer can control, but the connection between the jump box and host is, and the lower that latency, the better experience the user (or you, as the admin) will have.
As you can see, there is much to consider when designing a jump server (aka jump box) I have some ideas on how to minimize time to deployment while keeping in step with all of the above, and doing so without purchasing any software or services (the host and network are not included in this, as, well, these are baser requirements for having a network of hosts)
Others may already have designed and implemented systems similar to, or exactly the same as what I have come up with. I have no intention of disputing the origins of ideas, and have no claim that my ideas are in any way original. In fact, I read so many documents, white papers, blogs, forum posts, etc, that even if my complete system idea here is actually unique – it is heavily influenced by the works, ideas and issues others have stated in the past.
Let’s define our basic setup here. We’re going to have 2 1GbE networks of hosts and services. We’re going to have a 100mbps sync internet connection (not far fetched by any means these days) There will be a gateway segment (router and firewall), switches, a LAN local terminal and an internet remote terminal. We’ll be using Ubuntu Linux with UFW on several hosts, and one (our jump server) using HAProxy. This is where things break down a little bit, however, as there’s a couple possible ways to do things. One is to use SSH’s built in forwarding (which is complicated and isn’t friendly to large groups of users, though if you’re setting things up for yourself and have a number of hosts to connect from, and can copy your client configs to all the client hosts you use, it’s much more usable. This does not require HAProxy)
Another way is to use DNS and subdomains, with a sub for each host, all pointing to the same IP, and with using HAProxy, could use the same port (limiting potential security holes by allowing only one SSH connection port) This is the method I intend to use, though I already own my own domains and adding subs is free. For those who wish to not use subdomains but wish to have a single port for access, there’s another method. It relies heavily on HAProxy to scrap the connection and redirect to the appropriate host.
[It should be noted that at this point, I have not tested this nor confirmed the validity of the concepts and functions described below for this use. HAProxy can, however, forward standard SSH connections.]
Here’s the basic idea:
HAProxy is set up on the jump server. SSH connections are pointed to that jump server. HAProxy is configured to watch for those connections, and route that connection to the appropriate back end host on some data in the connection string. This could be either a subdomain, as I intend to use; a separate port for each backend server (but, part of the point is to have only a single port open); native SSH forwarding (again, this is cumbersome and difficult to propagate to multiple users); or, with quite a bit of advanced configuration in HAProxy, some other data in the connection string. The best direction here is to use some of SSH’s own connection options. The ability for SSH to run trivial commands on connection may be usable here.
SSH is designed to listen for commands after the connection string. When the SSHd receives a connection string with a command attached, the command is run and the connection is terminated. You can test this by connecting to your SSH enabled host, such as:
ssh firstname.lastname@example.org ls
Assuming “ssh email@example.com” would otherwise allow you to connect to a tty, the addition of the “ls” at the end would cause ssh to send you a list of the directory and then close the connection.
Without breaking the ability to pass commands via ssh, we’ll configure HAProxy to accept SSH connections, watch for a keyword after the connection string, and then either forward the connection to the back end host specified with out keyword, or to terminate the connection if no valid keyword is found (alternatively, connections without keywords could be forwarded to localhost to control the jump server itself; to a honeypot if you’re into that; or something silly like a quote server or some such)
We’ll want a keyword that will not break HAProxy, but will be easily used in connection scripts, and readily human readable and repeatable. I’ll start with “host=hostname” as such:
>ssh George@jump.bluntaboutit.com -p 1984 host=Orwell
The thought here is that when the above connection string is issued, HAProxy on jump.bluntaboutit.com (not a real address, btw) will scrap the connection string (as set in the frontend section), see “host=” and attempt to match “Orwell” to a known back end host, which if valid, would be in the “backend” section of HAProxy’s config file. HAProxy would then remove the “host=Orwell” portion of the string, forwarding the connection, and the connection string to the host named Orwell. Orwell would then perform authentication for the user/connection, and allow the login, if valid. This would allow us to have another host, Clark, which we could then connect to via the keyword “host=Clark”, having a separate “backend” entry in HAProxy, forwarding to a different host.
“BUT! SSH already let’s you do that” – Not exactly. SSH allows for forwarding, however it’s much more convoluted for each connection, and requires remembering port numbers for the back end hosts. With the “host=name” keyword abuse we’re doing here, the most a human has to remember is the name of the host. This does, of course, fall apart if there are multiple hosts named with serial numbers or some other obscure naming convention, such as “dc-cos0032” in which case you better have a good memory.
This method would also provide an avenue for increased security precautions, as well as security by obscurity. It allows an admin to easily connect to a back end server with only the most basic of requirements (ssh key on the device, and connection string info) without having to look up complex connection string options. It’s potentially usable in scripts as well.
Connecting from within the LAN would work similarly, if direct SSH connections to the servers isn’t acceptable. The jump server’s LAN IP would need to be used in lieu of the domain name (if applicable, and if the network does not have a local domain for the hosts to be identified by name)