These are somewhat dated rules, and generally too restrictive for modern safety critical development. For one thing, unbounded loops are situationally fine. The key distinction is that the program has to make forward progress regardless, which is not always the same thing as escaping the loop. Pool allocators have also become very common, so the complete prohibition on "dynamic allocation" is a lot more flexible now.
Many of the other rules broadly hold up though. You still want to avoid things like pointers because static analyzers are terrible at proving through them. This typically means LUTs are wrapped in the equivalent of an unsafe block though. Not ideal, and hopefully better tooling will eventually exist for these sorts of things.
> For one thing, unbounded loops are situationally fine. The key distinction is that the program has to make forward progress regardless, which is not always the same thing as escaping the loop.
The article somewhat alludes to this, although they don't use the notion of progress that you use, which is extremely important.
> This rule does not, of course, apply to iterations that are meant to be nonterminating—for example, in a process scheduler. In those special cases, the reverse rule is applied: It should be possible for a checking tool to prove statically that the iteration cannot terminate.
As a further searchable term, people can look up "total functions." Total functions must either terminate (loop ends) or be productive (always make forward progress). That as many functions as possible should be total is a pretty good rule of thumb in any codebase.
>> In those special cases, the reverse rule is applied: It should be possible for a checking tool to prove statically that the iteration cannot terminate.
This part made me pause for a moment, but then it clicked, and I'm remembering the rare and unusual experience of facing a process that escaped a loop that was supposed to go on indefinitely. A software equivalent of transmission belt slipping off. Most frequently seen in video games, where it's mostly harmless - but imagining a scheduler in an embedded system quitting on me like this fills me with dread to irrational extent.
> As a further searchable term, people can look up "total functions." Total functions must either terminate (loop ends) or be productive (always make forward progress).
That's a new one to me. I'm not deep into more advanced flavors of functional programming, but I thought total functions are defined as terminating in finite time for all inputs. I suppose it depends on whether you consider infinite loop a successful termination in infinity or not, but personally I wouldn't, at least not when it's clear the function is plain running in circles (as opposed to asymptotically approaching a value).
Yep, it's a bit terrifying to contemplate. If you're ever unfortunate enough to do safety critical systems, you'll have to remember that you have to protect against this not only at the software level with formal methods, but also at the hardware level to ensure that e.g. a bit flip in the CPU registers or the instruction sequence as it travels from memory also doesn't violate your safety model.
> I'm not deep into more advanced flavors of functional programming, but I thought total functions are defined as terminating in finite time for all inputs.
I was playing a bit fast and loose with wording since most notions of total functions are built on top of FP languages as you point out and therefore ultimately on the lambda calculus where we don't have a native notion of a loop.
Instead of loops in FP languages we have lists (and folds on the list or an unfold to generate the list) and other data structures that reify the notion of iteration.
Hence what I was really saying is that a total function can also generate an infinite data structure, which is only roughly like saying an infinite loop.
Totality is usually formally defined in terms of data and codata (or equivalently recursion and corecursion). The co- prefix may sound a bit scary, but an informal way of thinking about it is that data is always finite and codata can be infinite. The canonical example is that a list is an example of data and an infinite, lazy stream is an example of codata.
This seems like a lot of ceremony for no real good reason if you consider an "infinite" data structure as simply a finite data structure that can store functions that happens to have additional computation in it hidden behind some closure. And indeed you are still correct that an informal and usually sufficiently useful definition of totality for a function is that it terminates in finite time for all inputs (after all even if you generate an "infinite" data structure you still terminate in finite time and just hide the additional computation within a finite data structure).
It's still nice to have the data vs codata distinction though.
Practically speaking bringing in the notion of codata gives us a nice little heuristic for checking termination when creating an infinite data structure. Only types marked as codata are allowed to have infinite descent and when returning them from a function we have to always follow a certain syntactic pattern known as guarded recursion. Otherwise if we didn't have the notion of codata, a naive termination checker would have a hard time with checking termination of a function that recurses on itself, but only behind a lambda.
E.g.
// Pseudo-code
data class Stream(head: Int, rest: () => Stream)
// generateIncrementingStream(0) generates the infinite stream 0, 1, 2, ...
generateIncrementingStream(start: Int): Int = {
Stream(start, () => generateIncrementingStream(start + 1))
// This appears to a naive termination checker to never terminate
// You could try to simply check if the recursive call is inside a function
// but then you run into other issues if e.g. the entire function is called
// again and it becomes very thorny very quickly
// e.g. Stream(start, {(() => generateIncrementingStream(start + 1))(); () => generateIncrementingStream(start + 1)}
// never terminates even though all calls to generateIncrementingStream are
// contained behind lambdas. You might have to start doing some annoying
// symbolic computation.
}
vs
// Note that the recursive call to Stream in rest is no longer behind a lambda
// Under the hood this might be implemented with lambdas
codata class Stream(head: Int, rest: Stream)
def generateIncrementingStream(start: Int): Stream = {
// Much easier heuristic: all recursive calls must be inside a codata
// constructor, i.e. "guarded" by a constructor call hence "guarded recursion"
Stream(start, generateIncrementingStream(start + 1))
}
The general idea here is that a function that generates codata is productive and hence total if it generates a finite amount of the potentially infinite codata structure before its next recursive call. A nice heuristic that captures most useful cases is guarded recursion (of course because of the halting problem this is only a heuristic).
The usual heuristic is that for functions that generate data, you must have structural recursion, i.e. every recursive call must be on a piece of data "smaller" than the original input. On the other hand if you generate codata, you must have guarded recursion, i.e. every recursive call must be guarded behind a codata constructor.
This resolves some things quite nicely. E.g. if have a recursive function that accepts a codata `Stream` and wants to return some data, e.g. a finite `List`, then we must have some other argument that gets smaller every function call. E.g.
// Taking in infinite codata and producing finite data
def take(stream: Stream, n: NaturalNum): List[Int] = {
n match {
// This passes totality checking because m is "smaller" than n, which here
// means we had to strip away one level of NaturalNum constructors
// hence
Successor(m) => Cons(stream.head, take(stream.rest, m))
Zero => EmptyList
}
}
def streamIdentity(stream: Stream): Stream = {
// Since we're producing codata, we check for guarded recursion
// Passes totality checker because streamIdentity is guarded behind Stream
Stream(stream.head, streamIdentity(stream.rest))
}
def listIdentity<A>(list: List[A]): List[A] = {
list match {
case EmptyList => EmptyList
// Since we're producing data, we check for structural recursion
// Passes totality checker because the recursive call is on a smaller piece of data
case Cons(x, rest) => Cons(x, listIdentity(rest))
}
}
// Can't blindly produce finite data from an infinite piece of codata!
def nontotalStreamToList(stream: Stream): List[Int] = {
// Looking for structural recursion
// Does not pass totality checker because stream.rest is not considered to be smaller than stream since Stream is codata
Cons(stream.head, nontotalStreamToList(stream.rest))
}
// Can definitely produce infinite codata from finite data though
def padOutListWithInfiniteZeroes(initialSegment: List[Int]): Stream = {
initialSegment match {
case Cons(x, rest) => Stream x (padOutListWithInfiniteZeroes(rest))
case EmptyList => Stream 0 (padOutListWithInfiniteZeroes(initialSegment))
}
}
Bringing in the notion of codata also gives us some theoretical nicety. How do you express totality for a program that must run forever, e.g. a server? You say that the IO type a la Haskell is codata and that's that! Then you know that `main` having the `IO` type means that it can potentially run forever, but it will always generate a finite amount of stuff on every codata "iteration" and therefore is productive and is not just spinning its wheels in a pointless infinite loop that does nothing.
Rust solves the pointer escape problem via static analysis and annotations, so that’s feeling pretty dated too.
Similarly, allocation is fine with today’s large memories. You need to bound the request queue depth and the amount of memory allocated per request though.
I also question the idea that safety critical code is a special class that is distinct from normal code.
Buggy consumer devices lead to all sorts of deaths — after all, a one in a billion scenario is going to hit a few people per day on average — for example, look at the correlated 999 and 911 spam that hit iOS and (separately) android this week.
The problem with dynamic allocation (aside from the problem that it can fail) is that calls to malloc don't complete in a predicable time, so it's pretty much verboten in hard realtime apps. Pool allocators can (though not always are) written to return in a bounded time, so if it's created at startup, yeah, not really dynamic allocation.
They're not meaningfully different from the perspective of the programmer. You call a malloc-like function and get some memory back. The total dynamic memory use of the application is simply capped. You can do this even with libc malloc in a standards compliant way (e.g. Broadcom used to ship a malloc implemented as a series of pool allocators over statically initialized arenas), it's just uncommon.
> You can do this even with libc malloc in a standards compliant way
This is what I do in nontrivial embedded projects. It makes things easier and less error-prone to replace malloc()/free()/etc implementations directly with ones that behave as needed. And then you don't have the standard implementations sitting around waiting for a new dev to accidentally use them.
I think they're being too restrictive on recursion. They're excluding it based on not being able to prove it will terminate--but you can do that with a time-to-die counter.
Stack is a dynamic memory of sorts and some languages (Rust) heavily optimize towards keeping things on stack so that your data types could turn out to be quite big.
It is hard to determine the max value for time-to-die counter if your stack footprint is hard to estimate due to nested structs/enums/unions/generics/optimizations.
If you're being that pedantic, C itself is not Turing-complete in the first place: pointers have finite size, which means that C can address only a bounded amount of memory, so it is a finite state machine (just with an extremely large number of states).
I don't think it's about that. Technically, everything is finite. But placing limits on the loops and restricting memory allocation bounds the complexity. It probably limits everything to P-time.
Without dynamic memory allocation? I think you'd need at least O(n) memory to store enough state to go exponential. But it's some time I did complexity theory; I could be wrong.
How many problems in safety critical systems need Turing completeness and how many are fine with something weaker eg primitive recursion? My intuition is that most things are fine without full Turing completeness (as far as you can even apply terms about computability to systems with complex IO behavior)
> How many problems in safety critical systems need Turing completeness
Probably all, like the ones outside of critical systems.
But if you concentrate the complex behavior in a single layer, you will probably get a very thin layer, that you can call an OS or something like that, verify it very well, and have almost all the code follow a non-complete idiom.
There is something ironic about there being an obvious typo in the subtitle(domponents instead of components).
Regardless, I think these are generally good guidelines, but I'd not call the rules. Sometimes you really need the C preprocessor. For example, for compile-time assertions in C99.
Each function should have on average, at least two assertions that cannot be checked statically, and the calling code must try to recover from the failure.
So much for writing code that doesn’t suffer from the halting problem, or that has high branch coverage!
Vendors will implement full state machines in the preprocessor. Generated code will include the same header 2 or 3 times, because each time through, the state machine has changed and different code is exposed.
They are the rules for JPL, where they are sending shit to other planets and have 1 incredibly expensive shot to get the software right on the first try. The software cannot have an unrecoverable error condition and the safety critical portions of the code like that used for landing the spacecraft need to work on the first shot no matter what goes wrong. Even other spacecraft don’t need to have as strict rules but this is a different playing field
Perhaps not everyone knows what AUTOSAR is. https://en.m.wikipedia.org/wiki/AUTOSAR. Think: automotive brake controllers, that type of shit. It is safety critical.
My point isn't that the rules apply universally, it's that something found to be a good practice in a safety critical industry is flagrantly discarded in another, without good reason. Literal state machines in the preprocessor. Metaprogramming in C. But because it is generated code, it is "ok."
The automotive industry is weird like that. They started doing things a way, got stuck in a groove and now if you push back against it they look at you like you killed their dog.
I should really find a different field, for my mental well being.
First and most important, the title claims these are safety-critical rules, but the conclusion says that they are only being used on mission-critical projects. I get the feeling that there might have been some late-change copyediting done here, as the content is far too generalizing to be taken seriously in any specific safety critical context, and hopefully the author would have known that.
* Agree that many safety-critical coding standards are a grab-bag of sometimes dubiously-valuable rules.
* Agree with the implication that stylistic guidelines are almost totally worthless, and may distract reviews from their core task of finding functional errors.
* The document states that manual review of large cases may be infeasible, but I have never worked on a project that didn't mandate it. Drudgery goes hand-in-hand with safety-critical, and mindnumbing close review is an accepted element of this.
* Agree that less rules is good, but the author doesn't justify why 10 rules is the best. Unfortunately this is a field where correctness trumps easiness or simplicity, and any class of error with unacceptable risk to life/limb must be prevented. Thus any ruleset cannot justify itself simply because it is small.
* In fact, there are many wordings in this document that are generally-unacceptable because they are admitting incompleteness in the general case (and therefore unsuitability for any safety-critical purpose), such as "Although such a small set of rules cannot be all-encompassing" or "In return, it should be possible to demonstrate more convincingly that critical software will work as intended"
commentaries on specific rules:
* rule 4: "No function should be longer than what can be printed on a single sheet of paper". In my experience, readability is not always improved by limiting function length. The main goal of safety-critical be correct (and hopefully be obviously so). Sometimes splitting up code is harmful to this. (also this is actually a stylistic rule of the kind rejected in the 2nd paragraph of the document.)
* rule 5: In the safety-critical domain, defensive checks and "assertions" are usually considered at low-level-design. It's generally unacceptable for a coder to "just" insert defensive behaviours into the implementation, because all such behaviours must be analyzed for correctness. I don't disagree with the idea behind this rules, but a coding standard is the wrong place for it.
> Agree that many safety-critical coding standards are a grab-bag of sometimes dubiously-valuable rules.
What frustrates me is that most standards are self-aware of this and explicitly allow you to tailor them, but hardly anyone does that to a meaningful extent.
Do that and you are just creating targets for the next security auditor that you need any sort of certification or approval from.
In the end you'll choose the path of least resistance, which is to slavishly obey every rule on the checklist. It's not that people don't want to tailor the rules. They try at first, and then it gets beaten out of them.
It's doable sometimes when there's a tailoring framework. Here's a publicly available example [1]. Though I admit that it's usually easier to do in the US than in the European Union, so your experience may vary.
Many of the other rules broadly hold up though. You still want to avoid things like pointers because static analyzers are terrible at proving through them. This typically means LUTs are wrapped in the equivalent of an unsafe block though. Not ideal, and hopefully better tooling will eventually exist for these sorts of things.