SSH Emergency Access

timeattack · on July 4, 2020

It's cool and interesting application of the technology, but doesn't really seem to be practical.

When you're unable to access machine using your standard SSH keys usually it means that it's highly unlikely that it will be possible to login remotely via other means.

As an emergency login there are two common options:

* in case of cloud: use remote VM console provided by the hosting provider.

* in case of bare-metal: use IPMI to access machine console directly.

tashian · on July 4, 2020

Hey there — I'm the author of this post.

There's a few scenarios where I imagined this approach being useful:

* If you have any kind of remote dependency in your SSH auth flow (LDAP, or an online CA, or automated Ansible playbooks to push keys), any of those might fail and render the host otherwise inaccessible.

* It's becoming more common to not ever SSH into machines. So, what if emergency SSH access is the only way to access a host? Some companies even go a few steps further: When a host is SSH'd into, it is considered "tainted by humans", is quarantined and eventually shut down.

* Some hosts should never allow root access to anyone. For example, there's no reason for anyone to have root on a bastion host. So, what if the only way to get root on some hosts is with the emergency key?

While you could use the cloud VM console for emergency access in these cases, having a hardware key provides even more security and would let you turn off cloud VM access.

Of course if you broke your SSHD config, or have a network issue that prevents you from reaching the host, this won't magically fix any of that. IPMI is good for that though.

0xbadcafebee · on July 5, 2020

> While you could use the cloud VM console for emergency access in these cases, having a hardware key provides even more security and would let you turn off cloud VM access.

I'm not sure it's more secure, but I suppose it depends on the provider. Your control of your account's admin key (or password) is the last bastion of security for most providers.

> Of course if you broke your SSHD config, or have a network issue that prevents you from reaching the host, this won't magically fix any of that. IPMI is good for that though.

This is why I just use the providers' emergency management (or IPMI). Easier to have one method of emergency access that always works regardless of the guest. The guest's root (or emergency) account can still have a pretty darned complex password.

bubblesorting · on July 6, 2020

> It's becoming more common to not ever SSH into machines

This is a reality for me. At work we run a handful of distributed clusters, if anyone does an equivalent of sshing into a box and poking around (in our case, `kubectl exec`), the infrastructure team gets an alert, then follows up with whoever invoked the command. If they are doing debugging, we shift whatever resources they need into dev. If they are not debugging, they will probably get questioned by their boss. (fortunately, most of the time this chat results in, "oh wow I didn't know about the APM/Metrics/Graphs/Logs/etc setup we had, I'll check that next time)

CodeWriter23 · on July 4, 2020

There’s also the access control feature of this approach. You can give someone temp access to a host.

nokya · on July 4, 2020

That's exactly the reason why we use certificate based ssh access at my employer's: for our suppliers. It looks like the author went far away to find alternate reasons to deploy this :/

ThePowerOfFuet · on July 4, 2020

> Valid: from 2020-06-24T16:53:03 to 2020-06-24T16:03:03

Almost! I think you meant:

Valid: from 2020-06-24T16:53:03 to 2020-06-24T17:03:03

isatty · on July 4, 2020

Yep, the most common way I've lost access to machines is by messing up the iptables/ipfw rules. Read a post here about avoiding that by having a timed reset with sleep.

lazyant · on July 4, 2020

For people asking: you can create a resetfw.sh script, for iptables:

  #!/bin/bash

  iptables -P INPUT ACCEPT  
  iptables -P FORWARD ACCEPT  
  iptables -P OUTPUT ACCEPT  
  iptables -t nat -F  
  iptables -t mangle -F  
  iptables -F  
  iptables -X

chmod +x resetfw.sh

and add it for ex to /etc/cron.hourly directory

This way you can test your iptables rules and they'll get clear at every hour. Once you check they are OK you can delete this cronjob.

(NOTE: I'm typing from memory, haven't tested this)

benedikt · on July 4, 2020

https://manpages.debian.org/stretch/iptables/iptables-apply....

Or use `at` to run `iptables-restore`. Simpler than setting up a cronjob (and if youre doing it manually, cron has a bunch of gotchas that at least bite me in the ass once in a blue moon).

lazyant · on July 4, 2020

Yes. Although iirc (it may have changed, haven't looked "recently" the iptable- commands are distro specific, as in not all of them have / had them).

metiscus · on July 4, 2020

You might add a daily task to remove that task just in case you forget. That way you avoid lockout but don't end up opening yourself up accidentally.

taftster · on July 4, 2020

Or possibly just turn iptables off, in the same cron.hourly.

lazyant · on July 4, 2020

Ah yes, that's simpler: systemctl stop iptables. Also need to do systemctl disable iptables just in case, otherwise if the server reboots the iptables service will restart.

oars · on July 4, 2020

This has happened to me as well. Where could I read about this method?

raincom · on July 4, 2020

Maybe this:

   service network stop && sleep 10 && service network start

wiredfool · on July 4, 2020

The worst is: sudo ifdown eth0 && ifup eth0

leoh · on July 4, 2020

Link?

johnklos · on July 4, 2020

IPMI is painfully insecure, and therefore assumes the existence of a completely separate, protected network. Some people don't colocate more than a few machines (and therefore can't justify the extra infrastructure for an IPMI OOB network), don't want to pay extra for a colo provider to provide IPMI OOB, and/or don't trust their colo provider to have access to such a sensitive and insecure thing.

Having an emergency method to connect is an excellent idea.

sidpatil · on July 4, 2020

Not sure about other vendors, but I know Cisco offers dial-in capabilities for managing routers, switches, etc. The dial-in modem on the router is connected to a landline.

Has this approach ever been taken by server admins?

tyingq · on July 4, 2020

A standard for emergency IPMI or other console type type access would be welcome. Vendors have certainly done a bad job in this space. Break-glass type access isn't a new thing.

kevincox · on July 4, 2020

I think it depends. I've worked in places that had something like the following setup.

- Hardware in datacenters with operators who were not experts on the applications running. - All remote access was done using a short term (~1 day) ssh keys. There was an authentication service to generate these.

It was pretty easy to imagine that the authentication service would go down. In this case a selection of people who worked on the infrastructure had longer-term keys on HSMs. (With very high logging and alerting for any use). It would actually make sense for these to be CA keys so that they could access different user accounts or similar.

TL;DR you are assuming a very basic SSH auth setup. As the regular setup gets more complicated having something like this as a backup makes sense.

marcosdumay · on July 4, 2020

> All remote access was done using a short term (~1 day) ssh keys. There was an authentication service to generate these.

This is weird. Really weird.

Did that service use a more secure authentication storage than a password protected key?

jon-wood · on July 4, 2020

It’s really not - by limiting the life of keys, and having a service generating them, you can more effectively lock things down when someone leaves, rather than going round revoking keys from servers. Something we’re experimenting with at work is AWS Instance Connect, which uses your AWS credentials to push a key to a target instance with 1 minute validity - no more managing keys on instances, and revoking access is just a change to an IAM policy.

dsr_ · on July 4, 2020

As opposed to having a few bastion-hosts, and requiring people to log in there in order to then ssh on to their final destinations -- in that case, revoking their keys is as simple as wiping their accounts on the bastion hosts.

jon-wood · on July 6, 2020

Even with a few bastion hosts things get hard to track quickly as you end up with multiple clusters (dev/staging/UAT/production), and potentially multiple production clusters in different regions.

Spooky23 · on July 4, 2020

It seems weird but has several advantages. Most places screw up defunct account cleanup and privilege management.

A process like this allows you to ensure that people have the access they need and makes it easy to get them the privilege separation needed.

kevincox · on July 4, 2020

Yes, the system used multi-factor auth and could be locked for suspicious activities.

ed25519FUUU · on July 4, 2020

Here's what I like to do on my server(s) in cron, which pulls my keys from github:

    @hourly <username> ssh-import-id gh:<github username>

If I lose my keys to this host, I can simply update github.com with my new ones and go to lunch. I'll be able to login again shortly.

And on all of my hosts:

    @reboot <username> ssh-import-id gh:<github username>

This is REALLY helpful on devices like raspberry pi, where they may stay shutdown / offline for years. The minute they're powered up again they'll get my fresh keys and I can login to them without needing a console.

http://manpages.ubuntu.com/manpages/bionic/man1/ssh-import-i...

cbb330 · on July 4, 2020

Why not use certificates as your primary authentication for SSH? Facebook has a great blog post on implementing this at scale: https://engineering.fb.com/security/scalable-and-secure-acce...

tashian · on July 4, 2020

Yes! Shameless plug — we (smallstep.com) offer a service that makes this frictionless at scale and super easy to set up. You'll never want to go back to public keys.

theatrus2 · on July 4, 2020

If you’re on AWS and have credentials for users there, you can also run bless

https://github.com/Netflix/bless

masonhensley · on July 4, 2020

Neat - something I feel that often gets overlooked in most SAAS systems (think internal side) be it customer service, ops, etc tooling is break the glass escalation functionality. Most systems I’ve seen in the wild completely lack this and will result in over provisioning of admin “god mode” accounts.

NoodlesUK points out alerting which is a pretty important concept to incorporate.

Largely a solved concept in Electronic Medical Records & as outlined in the post.

noodlesUK · on July 4, 2020

I think another thing we might want to learn about is how to sound the alarm when the break glass is used. Is there an easy way of doing that with SSH? Running a command to page the ops/security team when a server receives a login attempt with an emergency credential?

withinboredom · on July 4, 2020

You can physically put the (yubikey) device in a vault that will physically sound an alarm when opened. It could also have a battery-powered arduino inside the box (with SIM breakout) that texted the devops team when opened.

chrisweekly · on July 4, 2020

[flagged]

BillinghamJ · on July 4, 2020

Overcomplex technical solution to a simple problem.

Besides which, if you really want to go full-on with technically clever solutions, keep in mind you could ensure no cellular service prior to opening. But then we're just getting into the realms of silly situations.

dcow · on July 4, 2020

Would you attempt to use the key if you knew the CTO, SRE and OPS teams were paged as soon as the safe was accessed?

rsync · on July 4, 2020

"I think another thing we might want to learn about is how to sound the alarm when the break glass is used. Is there an easy way of doing that with SSH?"

Yes - quite simple and old-fashioned, actually ...

I have this line in the SSH users' .login file:

  /usr/local/sbin/sms 4153331111 4158882222 "USER LOGIN TO XXX - $DATE" >& /dev/null

... where the 'sms' command, above, is a shell script I wrote to call twilio messaging with the curl command. A very simple example of that would be:

  curl -X POST -d "Body=$msg" -d "From=$from" -d "To=$to" "https://api.twilio.com/2010-04-01/Accounts/$accountsid/Messages" -u "$accountsid:$authtoken"

... and this works like a charm.

Alternatively, you could rick-roll your on-call sysadmin:

  /usr/local/bin/curl -XPOST https://api.twilio.com/2010-04-01/Accounts/$accountsid/Calls.json --data-urlencode "To=$number" --data-urlencode "From=$callerid" --data-urlencode "Url=http://demo.twilio.com/docs/voice.xml" -u $accountsid:$authtoken

(the voice.xml demo is, in fact, Rick Astley)

rnotaro · on July 4, 2020

Direct link of the audio that the XML will play: https://demo.twilio.com/docs/classic.mp3

jlgaddis · on July 4, 2020

What prevents a user from disabling or bypassing the command execution (if one desired to do so)?

tashian · on July 4, 2020

Hey there, I wrote this post. It's a great question.

One benefit of using certificates for emergency access is that SSHD logging can be configured to show a lot more detail about the certificate that was used. With public keys, there isn't anything to show. But with certificates you have a key ID, serial number, principals, CA fingerprint, etc. So, that log is a good hook for sounding the alarm. A more advanced version of this would allow you to record a reason for using the emergency access key when the connection is made (or when sudo is used).

jlgaddis · on July 4, 2020

> With public keys, there isn't anything to show.

There's, at minimum, client IP address, username, and the key fingerprint -- which has always been good enough for me.

There might be even more details available but I'm not sitting in front of a computer to check.

gnufx · on July 4, 2020

I don't know what it looks like for a certificated system, but syslog records the private key used for login in a fairly vanilla Debian. If you worry about things like that and aren't looking at physical access (as suggested elsewhere), you presumably have remote syslog and audit which you can check.

lormayna · on July 4, 2020

You can use pam-hooks module to execute scripts at login/logout.

aidos · on July 4, 2020

You can add a script at ~/.ssh/rc that’s run on each login. You’d need to be careful to make sure it couldn’t be changed if you were relying on it for notifications.

gnufx · on July 4, 2020

How do you record the key used from that (assuming that's what's required)?

nix23 · on July 4, 2020

Maybe monitor the Emergency Machine itself? If it boot's up, emergency credentials are probably used?

But really good point, and i love the analogy to 'break glass'

andylynch · on July 4, 2020

There are tools like Powerbroker which do this, and also privileged access management more generally - popular at banks and the like. Also SSH (the company)

8organicbits · on July 4, 2020

Login shell for emergency accounts could be a script that "sounds the alarm" and then drops to a bash shell.

Edit: ooo @rsync just gave another good approach

jlgaddis · on July 4, 2020

Generating alerts from syslog messages is something that we've been doing for decades.

modinfo · on July 4, 2020

What a coincidence, 3 days ago I ordered two pieces of yubikey 5, today arrived a package and today I read a post on how to use them in an interesting way for emergency access to my server via SSH.

I'd like to add that the way it's described really works.

But... Now I don't know to leave one yubikey in case I need to use it for emergency access to ssh? I have a server since 2011 and I have never problems with access through ssh, I use the same keys to this day and everything works.

I think this way with yubikey to emergency access is overkill.

It's just an interesting way to use yubikey.

danmur · on July 4, 2020

If you need to the option to give someone temporary access it seems like a good option. I don't think it would add anything to my personal stuff since there's no reason I can think of to give someone else access. At work definitely.

dcow · on July 4, 2020

Right, this is more about a cryptographic grant of temporary emergency access to someone who doesn't have a user account or admin keys already on a machine (and ideally nobody should have persistent admin access in a well-oiled production setting) in the event that existing access control mechanisms have failed. And backing the signing operations by a YubiKey lets you physically secure the key in ways that you wouldn't an entire laptop and provides all the benefits of tamper resistant, proximity aware, hardware. Probably not something most people will want or need to bother with for personal stuff, but very reasonable expectation as soon as you're working on a team or managing many hosts, etc.

alexandrerond · on July 4, 2020

I'm very confused, given Yubikeys have smart card fuctionality and they can be used by gpg-agent to SSH with the regular gpg key (you can add to authorized_keys just like any other keys) and you don't have to go through this whole mess of setup to create a CA and install it.

What am I missing?

munchbunny · on July 4, 2020

It's a chicken and egg problem: if you can't SSH into the machine, how do you add your key to the SSH config on the target machine?

You could use a very long lived key, but then as soon as you have multiple people who might need production SSH access, you've got access control and revocation issues. The SSH CA is a good minimal solution, because the CA can issue only short-lived SSH keys (few hours at a time) that you use once and throw away. Also, CA trust scales better because it moves user management burden to the certificate issuing process and removes the need to modify the SSH config every time you onboard a new user.

It's a pretty standard practice. Here's a post from Facebook about it from several years ago. This post is just about how to do it using YubiKeys. https://engineering.fb.com/security/scalable-and-secure-acce...

harikb · on July 4, 2020

This can also work as a solution where the “setup” (of trusting CA) is baked in to the image. Then there is no ssh related setup until the day you actually need to ssh to the host. And you get the guarantee that no ssh login can happen until you issue a temp-pair.

This is actually quite useful for deploying clusters of machines that one doesn’t want normal ssh access until there is a real need. I think this was also mentioned in another comment

tashian · on July 4, 2020

That sounds like a great option too, depending on your situation.

One difference is that the CA is on the hardware key, but the cert (and its private key) is not.

Imagine you're on a team of 50, and anyone on the team might need emergency access to a host at some point. You wouldn't want to buy 50 keys and 50 safes. Just designate a couple folks to manage emergency access. They can manually mint a cert for a colleague as needed, and send it over a secure channel. No security key needed to use the cert, and it self-destructs after a few minutes.

aaronmdjones · on July 4, 2020

To add to the blog post; you don't need brew or step or any of that nonsense to inspect certificates.

    $ ssh-keygen -Lf the-cert.pub

dcow · on July 4, 2020

You can, but `ssh-keygen` is about as nice to use as `openssl` which practically means you spend a lot of time with your head in the manual. The `step` tools have a nicer UI:

    $ step ssh inspect the-cert.pub

Also the post already mentions that using `step` instead of `ssh-keygen` is optional, so I'm not sure why you feel the need to repeat it...

aaronmdjones · on July 4, 2020

Right, you'd have to look up the switches in the manpage if you don't remember them, but that's already the case with the generation portion, which is why the post includes the switches for that. I'm just saying it could have included the inspect switches too.

dcow · on July 4, 2020

We actually plan to update the post to demonstrate doing it entirely with the `step` tool. We just want to do a pass on the UX to make sure it is as easy an foolproof as possible before bringing more attention to it.

munchbunny · on July 4, 2020

This is a pedantic detail, but if you're trying to implement this system, it does matter: "resident key" is not a required feature here. You're not using the hardware token for its WebAuthn capability, you're using it for its smart card capability.

You just need PKCS11 token support for SSH, which the YubiKey's smart card capability can do. YubiKey 4 and YubiKey FIPS can both do it, and so can regular old smart cards even though that form factor is a lot less popular now.

The workflow is the same: generate a key pair on the hardware token, have the CA sign it, install the signed cert onto the hardware token, and then SSH with it.

closeparen · on July 4, 2020

The procedure here is actually using WebAuthn, which is now explicitly implemented by OpenSSH.

markpeek · on July 4, 2020

Using certificates with SSH is the way to go for shared access servers. Here's an open source way (yes, I'm involved in the project) to manage authorization and access with asynchronous approvals:

https://github.com/cloudtools/ssh-cert-authority

dcow · on July 4, 2020

Smallstep also offers an open source ssh-aware kms-backed certificate authority.

https://github.com/smallstep/certificates

One nice advantage is its support for different provisioning flows. The oauth flavor allows you to hook into an existing identity provider to authenticate certificate requests.

Simply:

    $ step ssh login

and boom you've got a short-lived ssh certificate in your ssh-agent using a private key that never touched the disk.

jlgaddis · on July 4, 2020

  Valid: from 2020-06-24T16:53:03 to 2020-06-24T16:03:03

Um...

ascotan · on July 4, 2020

I don't get it. When would u need a backup ssh key other than if a user lost access? Most VMs have console access for this purpose.

asdfasgasdgasdg · on July 4, 2020

I'm not sure of the exact scenario but I would just note that there are other types of computing environments than virtual machines. For example, there a physical machines, sometimes hosted in a colo where you have no employees on the ground.

gnufx · on July 4, 2020

Surely you'd have some sort of remote KVM in such cases (like IPMI, as mentioned in another comment). That's critical in the clusters I've run and, of course, the manufacturers' implementation of that critical functionality in IPMI is likely to be rubbish and you can't get it fixed...

kubanczyk · on July 4, 2020

I wonder if there are some khem-khem notable clouds that just don't provide an old-school tty login. /s

dmitrygr · on July 4, 2020

Website unreadable on mobile. Commands cut off and not scrollable.

ausjke · on July 4, 2020

what about port knocking?