10 Years SeL4: Still the Best, Still Getting Better

loudmax · on Aug 6, 2019

Will it be possible to run a web browser on an SeL4 kernel some day, or is that completely beside the point?

Obviously, formally verifying a modern web browser with a javascript engine and everything is a whole other undertaking, so I'll leave that aside. But the notion of using a very secure kernel as the base OS is appealing. Even if a hostile site broke out of the web browser, SeL4 should keep it sandboxed. Presumably, SeL4 could enforce isolation between different browser tabs. Is this something we can look forward to in the future, or are the performance expectations of modern javascript heavy web sites out of line with what SeL4 is likely to deliver?

Anyway, there are plenty of other applications beside the web, so glad to see that SeL4 is being used in real world deployments.

wmf · on Aug 6, 2019

Technically, a capability-based desktop OS makes perfect sense. In practice, the microkernel people have been polishing their kernels and about to start on userspace next year... for 30 years.

(One exception to this is QNX which was killed off by weird business decisions.)

flo123456 · on Aug 7, 2019

Take a look at Genode. It‘s come quite far in that regard.

https://genode.org/

pjmlp · on Aug 6, 2019

My Android devices running userspace drivers written in a mix of Java and C++ thanks to Project Treble seem like a step into the right direction.

Or the new userspace device driver model introduced by Catalina. As per WWDC, for every new user space driver API, the old kernel one will be automatically marked as legacy and removed after one OS release, with the long term roadmap to replace all the kernel API extensions.

Just like Windows 10 increases the amount of sandboxing with each bi-annual release.

StudentStuff · on Aug 6, 2019

Userspace drivers permit upgrading the kernel to a certain point (depending on how kernel APIs change), but your forever trapped with the bugs and vulnerabilities in the userspace blobs.

Its a step forward in kernel upgradability, but it also takes the pressure off vendors like Qualcomm and ARM to upstream their patches. I hope the kernel continues making significant changes that break out of tree, proprietary drivers.

The harder it is for these players to act in bad faith, the sooner they'll collaborate with the kernel developers and mainline their code.

pjmlp · on Aug 7, 2019

As proven by every other OS with proper driver ABI, this isn't an issue.

And as proven by FOSS drivers that eventually got purged from kernel tree, having source available is meaningless if everyone just lets the code rotten to the point of being purged.

As for pressuring vendors, the source code is available, and the license allows them to never upstream, it isn't as if Joe and Jane know what that is when buying a new phone.

StudentStuff · on Aug 8, 2019

Windows has had multiple breaking changes to the driver ABI between Windows XP and present. There is no "proper" futureproof driver ABI, otherwise there would be a major OS using one.

pjmlp · on Aug 8, 2019

In almost 40 years, there were two breaking changes from 9X to NT family, from XP to Vista.

Two. Linux? Who's counting anyway.

wmf · on Aug 6, 2019

What if we pressure Qualcomm and ARM for 20 years but they still don't upstream anything? What about 50 years? Would you change your position?

StudentStuff · on Aug 8, 2019

So long as they're spending most of their cash on maintaining out of tree patchesets for the Linux kernel, that is tens to hundreds of millions a year in wasted money on their part.

Eventually the community will write their own drivers (as has happened with Mali) and the respective vendor will have a reckoning internally (like what happened with AMD's GPU driver) whereby those advocating for wasting money on out of tree code leave the organization with egg on their face for bilking the company out of significant R&D cash.

It is very hard to claim that its worth it to develop a proprietary driver when a comparable performance driver already exists for your hardware in-tree. The in-tree driver will almost certainly be easier to use & break less often than out of tree code that the kernel devs don't think or care about breaking.

ac29 · on Aug 7, 2019

> QNX which was killed off

Are you sure? The latest version is only ~2 years old, and they still seem to be hiring: https://bb.wd3.myworkdayjobs.com/QNX

Matthias247 · on Aug 7, 2019

I was in the automotive industry. 10 years ago every infotainment System I had seen was either built on QNX or windows CE. When I left the industry later all systems I interacted with had moved to Linux. Why? Driver and off the shelf application support was a big reason. For QNX every driver had been ported, which is a huge task for something like a modern GPU.

wmf · on Aug 7, 2019

It looks like the desktop version of QNX was killed off in 2014 although they stopped promoting it much earlier.

ahartmetz · on Aug 7, 2019

There are some in-car infotainment systems and dashboards running on QNX. It's a fairly pleasant system to work with.

garmaine · on Aug 7, 2019

And Minix which runs inside every Intel CPU.

nickpsecurity · on Aug 7, 2019

High-assurance security did do work on secure browsers. One prototype inspired Chrome's model. Here's an old thread where I listed some:

https://news.ycombinator.com/item?id=9962444

One or more might be integrated into seL4. Alternatively, components in something like IBOS might be done with higher assurance like seL4 or Muen.

snvzz · on Aug 6, 2019

To see what seL4 can do in practice in a desktop/workstation/server context, I suggest you take a look at Genode. Particularly, their recent dynamic system images.

kragen · on Aug 7, 2019

You can run Linux under L4, as mentioned in the post. I understand this is a common thing to do on phones with Qualcomm chips, using OKL4. But that isn't to protect you from malicious software; it's to protect malicious software from you. Qubes is a similar system with more praiseworthy aims, but not using L4 as the hypervisor.

Running a web browser directly on a microkernel is a reasonable thing to do, and it's likely that Fuchsia will end up doing it, but it will involve a lot of work.

The performance expectations of JS generally do not have anything to do with the operating system, but as explained in the post, when it comes to fundamental OS services, seL4 is more than an order of magnitude faster than even other microkernels, much less FreeBSD, Linux, or Microsoft Windows.

Microkernels generally need higher performance for those things because operations such as read(), which involve just a system call on a monolithic kernel, with context switches into and out of the kernel, involve a full context switch to a separate server process and back on a microkernel.

michaelmrose · on Aug 7, 2019

>...when it comes to fundamental OS services, seL4 is more than an order of magnitude faster than even other microkernels, much less FreeBSD, Linux, or Microsoft Windows.

This seems to imply that seL4 is faster than FreeBSD, Linux, Windows is this what was intended to be conveyed?

kragen · on Aug 7, 2019

Yes, but most of what you do on a computer isn't bottlenecked on the kernel—less still on a microkernel. So this doesn't necessarily mean the computer will feel faster or finish computing sooner.

michaelmrose · on Aug 7, 2019

Questions about performance are often complicated affairs that involve considering all factors including hardware, software, data operated on, etc to get meaningful answers.

That said I don't think data exists to show that SeL4 is faster than mainstream OS kernels just much faster than research microkernels which have always been much much slower than mainstream monolithic kernels.

In fact one would reasonably suppose that it would be measurably slower due to increased overhead. This may well be well worth it but this doesn't mean we ought to engage in speculation not backed by data.

kragen · on Aug 7, 2019

No, research microkernels are generally much faster than mainstream OS kernels when you measure the operations that are implemented by both, because the crucial question for microkernel viability is whether they can be made to be fast enough; that's been true for 30 years. So an immense amount of microkernel research has focused on that one thing. You can run lmbench on your Linux laptop and get some numbers you can compare with published L4 numbers on similar hardware.

The way that microkernels are sometimes slower is, as I said above, on operations that monolithic kernels implement internally, but microkernels leave to userspace processes. For example, ping-pong IPC is something you can measure on L4, and reliably get orders of magnitude better performance than on Linux. But creating a file isn't, because L4 itself doesn't have files—you can implement them in userspace, and that can be done with varying degrees of extra overhead. It might still come out about even with Linux (I think I've seen some results to that effect) but it probably won't be orders of magnitude faster.

The part of your comment I agree with is that measuring performance is complicated.

michaelmrose · on Aug 7, 2019

Can you point to benchmarks of software performing an actual useful task on both a mainstream OS and SeL4 for an apt comparison?

kragen · on Aug 8, 2019

No, I don't know of any. But that's not relevant to what I'm claiming, which is a much narrower claim.

michaelmrose · on Aug 9, 2019

>seL4 is more than an order of magnitude faster than even other microkernels, much less FreeBSD, Linux, or Microsoft Windows

Seems to be a pretty broad claim without anything to back it up.

kragen · on Aug 9, 2019

At this point you are exhausting the presumption of good faith in two ways.

First, by omitting the immediately previous qualifier from your quote: "when it comes to fundamental OS services, seL4 is more than an order of magnitude faster...". This omission turns my correct claim into a plausibly false one, one which I have specifically disclaimed, in detail, in three separate comments in this thread already, in order to avoid the kind of confusion that could reasonably produce reactions like yours. The "benchmarks of software performing an actual useful task on both a mainstream OS and SeL4" you ask for are relevant to the plausibly false claim, but not to the correct claim I was actually making.

Second, you falsely claim that I am saying things without anything to back them up. This would be a legitimate criticism if I were talking about performance numbers for some kind of trade secret software, but in fact (with the exception of Microsoft Windows) all of the operating systems discussed here (seL4, FreeBSD, Linux, plausibly Fiasco.OC and Mach) are open source and have abundant published benchmarks, including many in the seL4 papers published by the dude whose blog post we are discussing. There is a very great deal of freely available evidence.

In short, you are lying about what I am saying, and you are lying about the available evidence for evaluating its veracity. Therefore I think the presumption of good faith on your part in this conversation can be soundly dismissed.

michaelmrose · on Aug 10, 2019

I asked you what you thought were meaningful benchmarks illustrating your point

I'm not lying about what you said. I even copied the longer section of your quote in the prior post aside from the entire thread being right above on my device and yours.

I'm also not lying about the availability of evidence to prove your claim just asking you to provide a link to such.

You spoke with authority and then get upset and wave your hands about yelling LIAR when asked for supporting links.

Perhaps begin again respectfully or cease discussing.

mmrezaie · on Aug 6, 2019

This is a great view on history of seL4. Is there a list of use cases where seL4 is being used outside academia?

Also, I would like to know why Google didn't go for seL4 and instead created Fochsia?

I wanted to mention that we wanted to see if we can use L4 for isolating our network filter from Linux kernel (trustability) but instead we opted for bitvisor (much simpler code base but not verified).

im_down_w_otp · on Aug 6, 2019

We're using seL4 as the basis for one of our products which is focused on characterizing components, systems, and their failure modes. Mostly targeting critical "Cyber-Phyiscal Systems" and "Software-Intensive Systems of Systems".

We have a little operating system, or enough of one to facilitate building the rest of our product atop of it, that goes along with seL4. We'd planned to open source parts of it fairly soon, but prioritizing hiring a few more folks (BD, PM, Eng.) has put that effort on-hold temporarily.

Building things with seL4 is easy to do wrong, despite being given a variety of very low-level verified primitives to work with. You still need to put it all together just right. CAmkES is supposed to make this easier, but our experience with it wasn't very good, so we took the hard road and created a Rust-based application framework to go over the top of everything.

mmrezaie · on Aug 6, 2019

> seL4 is easy to do wrong

This was our main stopper. seL4 would benefit greatly to teach how everything can be fit together, and where everything can be expected to go.

im_down_w_otp · on Aug 6, 2019

We overcame this with a significant amount of Rust-i-ness and effort in understanding the guts of the kernel and how it implements its assurances.

Basically by providing setup, resource management, and communication abstractions that you just can't get wrong because doing something dumb that would be entirely nuts and/or would result in difficult to deal with run-time errors (as the kernel enforces some of its guarantees) ends up being rejected by the compiler ahead of time.

repolfx · on Aug 7, 2019

What sort of mistakes were easy to make?

anp · on Aug 6, 2019

Was this work a part of the post's unnamed self-driving car company project?

im_down_w_otp · on Aug 6, 2019

Nope. Entirely different. Left prior place just over a year ago. Eventually started a new company and closed our Seed funding a few months ago.

We have some self-driving car customers, but that's just because they're gluing together really complicated systems from lots and lots of diverse components, so they're great use cases.

Our goal is much broader scoped beyond AV and is more akin to creating new classes of CAE tools for SISoS & CPS designers/engineers.

rodgerd · on Aug 6, 2019

> This is a great view on history of seL4. Is there a list of use cases where seL4 is being used outside academia?

"Can you build actual systems with it?" https://www.youtube.com/watch?v=lRndE7rSXiI&t=22m

The whole talk is well worth a watch.

mmrezaie · on Aug 6, 2019

I forgot to mention that I had seen these, but I thought after so many years from L4's release something like isolating firmwares from the OS in our smartphones would have been done.

I am not aware of any other capability based and verified micro kernel in level of [se]L4.

atombender · on Aug 7, 2019

Zircon, the Fuchsia kernel, is much bigger than seL4. For example, it provides a process model and syscall interface.

But Google may have other reasons to reinvent their kernel (though they're not starting from scratch; Zircon evolved from the Little Kernel by Travis Geiselbrecht). They may have decided that the flexibility of inventing a new microkernel for their exact requirements and moving fast with it would be more productive than reusing an existing one. Building on the careful verification process of seL4 is presumably quite time-consuming, and might get in the way.

When NeXT/Apple started using the Mach kernel, it ended up being heavily rewritten to support Apple's requirements, and they never adopted the microkernel design. Google may have seen a seL4 dependency as being a similar kind of burden.

seL4 is also GPLv2, I believe. Might also be a blocker for Google.

dmitrygr · on Aug 6, 2019

because that would bore the people who are in the fuchsia team purely to not be bored?

habitue · on Aug 6, 2019

Man, this meme that Fuschia is literally a "dont quit please, bored developers" project is huge on HN. It's repeated often like it's common knowledge. Is there any real corroboration for it, or is it just a fun pet theory people like?

wmf · on Aug 6, 2019

The fact that Fuschia is a large project without any public purpose, roadmap, architecture process, or governance invites these theories. If you want to be so mysterious you can't complain when people speculate.

ryacko · on Aug 7, 2019

I strongly suspect that any roadmap would mostly consist of features present in other existing OSes, removing the mystique.

It will probably be like any other modern OS, but designed to be a modern OS from the beginning.

Which might save Google 0.1% in electricity costs.

carapace · on Aug 6, 2019

*Fuchsia

ansible · on Aug 7, 2019

From what I've seen, I believe that Fuchsia is intended to eventually replace GNU Linux based Android as the mobile platform for Google... eventually. The idea is to have a good platform for VR/AR apps (soft real-time processing / rendering), while providing a system that can be continually upgraded.

Unlike the current situation with the Android ecosystem right now, where a newly released product is lucky to get even a few security updates in a year or so, and have effectively had support dropped after that.

Compare that to the ChromeOS ecosystem, where Google guarantees 5 years of support from the time the first production on a platform was released.

mmrezaie · on Aug 6, 2019

Bored?

I don't think they only wanted to reinvent the wheel and I don't think trustability was one of the motivations of their design!

mpweiher · on Aug 6, 2019

TFA mentions:

"Another fork of L4-embedded is what now runs on the Secure Enclave of all Apple iOS devices."

All Apple iOS devices is pretty big.

vilhelm_s · on Aug 6, 2019

But L4-embedded is not seL4. It would be cool to see some large-scale deployment of a formally verified system.

carapace · on Aug 6, 2019

FWIW, OK Labs was acquired by General Dynamics in 2012.

elcritch · on Aug 7, 2019

Are there any resources for running SeL4 on a raspberry pi or other arm SBC'S? The article mentions hypervisor support, which makes me think it'd be a possible way to run Linux on top of a secure RTOS. Working in the embedded world and I ponder what could be possible with a high security hypervisor.

im_down_w_otp · on Aug 7, 2019

seL4 isn't an RTOS, FWIW.

There is hypervisor support, sort of. Insofar as there's enough there to build your own hypervisor (we ended up needing to do something like this), and there's a sample VMM for a very specific host and guest target combination available that's created via CAmkES.

Making an seL4-based hypervisor truly cross-architecture capable as well as generalizable for different host and guest target configurations is a non-trivial, painstaking, arduous task. In fact making anything on seL4 truly cross-platform isn't easy. The kernel doesn't provide much in the way of unifying abstractions over the architecture differences. The different architecture details are mostly thrust onto you to figure out what kind of abstractions you want to make.

The MCS (mixed criticality support) stuff in seL4 getting mainlined should make it theoretically possible to invent/create a hypervisor that enables you to run MCU oriented RTOSes & applications in an isolated context atop non-HRT hardware. Data61 has mentioned that they're going to take a whack at making some part of an AUTOSAR stack that runs in a context like this, but I have no idea how far along that is. Separately, we're looking at leveraging the same underlying capability for a variety of different adjacent purposes.

That said, there are a variety of supported platforms you can already build seL4 for: https://docs.sel4.systems/Hardware/, our experience has been that some work better than others. It's possible to run in QEMU as well to get started.

snvzz · on Aug 7, 2019

How is seL4 not a hard RTOS?

It even has proof of worst case time (WCET), which to my knowledge no other non-trivial RTOS does have.

I'm very confused now.

msjyoo · on Aug 7, 2019

seL4 at the moment doesn't have kernel interrupts. The kernel relies on run-to-completion and "incremental consistency" partly because IPC in seL4 is already short so the time window for a kernel interrupt to be needed is also correspondingly small, and because it makes proofs easier. Of course, this means there's possibilities of long tail latency for interrupts to be delivered to userspace if the kernel needs to run a long operation.

repolfx · on Aug 7, 2019

My understanding of seL4 is that the kernel never really needs to run a long operation because it does so very little. For instance filesystems would be a usermode process, along with nearly all drivers. Isn't that why the MCS stuff works - the kernel may not be able to interrupt itself but that doesn't matter because it can interrupt userspace, which is where 99% of traditional kernel code ends up?

msjyoo · on Aug 7, 2019

That's not quite true. It's not solely because seL4 does little. seL4 maintains low interrupt latency through "preemption points" - basically points where the kernel can and can only context-switch while maintaining global consistency.

Keeping in mind I'm not an expert on this, I dug up these papers when I was vaguely looking, they may be useful to you:

"Improving Interrupt Response Time in a Verifiable Protected Microkernel" https://ts.data61.csiro.au/publications/nicta_full_text/5391...

"To Preempt or Not To Preempt, That Is the Question" http://ts.csiro.au/publications/nicta_full_text/5859.pdf

repolfx · on Aug 7, 2019

Thanks, that's super helpful.

snvzz · on Aug 8, 2019

AIUI there's only one such preemption point, in some resume-able operation to do with cleanup. Every other operation is short enough.

Thus interrupt latency is guaranteed to be very low, even if the interrupt happens while the microkernel is running.

It is my intuition that the "it's not a RTOS" has to come from something else, like some non-implemented API that's expected in an RTOS or the like. The required services might actually be implementable as non-privileged tasks.

repolfx · on Aug 8, 2019

Maybe it's just slight conservatism. If all existing RTOS' have pre-emptible kernels and yours doesn't, perhaps you'd rather let the market decide you've built an RTOS rather than stating you have and risking an argument over terminology.