Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems all too often we (coders) are encouraged to think of errors as these exceptional things that happen rarely; deserving only of a few cursory preventative treatments to the code -- or worse, treated as something only to be fixed lazily, as bugs and crashes surface during testing. Indeed -- this philosophy is ingrained into a significant percentage (if not the majority) of the programming languages we use!

On the contrary, I believe we should expect to spend the majority of software engineering time writing and thinking through error-handling code, before even your first test[1].

I'd even go so far as to say: If you're not spending at least half your time on so-called 'error handling', you're probably doing something wrong, like using a language feature (like exceptions) to defer that technical debt to later -- and you'll regret it, if your project matures, I assure you.

This is why I so greatly appreciate languages like Rust and Zig[2] which remove exceptions and unchecked null pointers from the language entirely (with few edge-case exceptions), and provide a powerful and robust type system that allows us to express and handle error conditions naturally, safely, and elegantly.

[1] To be clear, by no means am I downplaying the importance of test code, or even manual testing; rather, I'm arguing that purely "test driven development" is not sufficient to yield extremely robust software of significant sophistication.

[2] These aren't the only examples, but they're among the only that aim to be C++ (Rust) and C (Zig) replacements, that also made the "right" design choices (IMO) in removing both exceptions and unchecked null references.



This strategy tends to fail economically. The tech startups that succeed are usually ones that let their customers do things they would not otherwise be able to do. Usually doing something that nobody has done before is hard enough without considering the corner cases; if it follows a typical 90/10 rule, then doing 100% of the job will take 10x as long as the competitor who's only doing the easiest 90%, and your market will have been snapped up by them long before you can release a product. Customers would rather use a product that works 90% of time than do without a product entirely, at least if it delivers functionality they really want but can't get elsewhere (and if it doesn't, your company is dead anyway).

Once you've got a commanding lead in the marketplace you can go back and hire a bunch of engineers to finish the remaining 10% and make it actually work reliably. That's why solutions like testing & exceptions (in GCed languages) succeed in the market: they can be bolted on retroactively and incrementally make the product more reliably. It's also why solutions like proof-carrying code and ultra-strong (Haskellish) typing fail outside of markets like medical devices & avionics where the product really needs to work 100% at launch. They force you to think through all cases before the program works at all, when customers would be very happy giving you (or a competitor) money for something 80-90% done.

Someday the software market will be completely mature, and we'll know everything that software is good for and exactly what the product should look like and people wouldn't dream of founding new software startups. At that point, there'll be an incentive to go back and rewrite everything with 100% solid and secure methodologies, so that our software has the same reliability that airline travel has now. That point is probably several decades in the future, though, and once it happens programming will not be the potentially extremely lucrative profession it is now.


I'd agree that it's totally reasonable to 'hack together' a quick prototype with 'duct-tape and cardboard' solutions -- not just for startups, but even in full-scale engineering projects as the first pass, assuming you intend to throw it all away and rewrite once your proof-of-concept does its job.

The problem is that these hacky unstable unreliable solutions sometimes never get thrown out, and sometimes even end up more reliable (via the testing and incremental improvement methods you mention) than a complete rewrite would be -- not only because writing reliable software is hard and takes time (beware the sunken costs fallacy here!), but because sometimes even the bugs become relied upon by other libraries/applications (in which case you have painted yourself into a REALLY bad corner).

It's a balance, of course. You can't always have engineering perfection top-to-bottom (though I would argue that for platform code, it has to be pretty close, depending on how many people depend on your platform); if you shoot too high, you may never get anything done. But if you shoot too low, you may never be able to stop drowning from bugs, crashes, instability, and general customer unhappiness no matter how many problem-solver contractors you may try to hire to fix your dumpster fire of code.

So again: Yes, it's a balance. But I tend to think our industry needs movement in the "more reliability" direction, not vice versa.


This is simply not my experience with exceptions. Exceptions are frequently thrown and almost never need to be caught, and the result is easy to reason about.

My main use case for exceptions is in server code with transactional semantics. Exceptions are a signal to roll everything back. That means only things that need rolling back need to pay much attention to exceptions, which is usually the top level in jobs, and whatever the transaction idiom is in the common library. There is very little call to handle exceptions in any other case.

GC languages make safe rollback from exceptions much easier. C++ in particular with exceptions enabled has horrible composition effects with other features, like copy constructors and assignment operators, because exceptions can start cropping up unavoidably in operations where it's very inconvenient to safely maintain invariants during rollback.

Mutable state is your enemy. If you don't have a transaction abstraction for your state mutation, then your life will be much more interesting. The answer isn't to give up on exceptions, though, because the irreducible complexity isn't due to exceptions; it's due to maintaining invariants after an error state has been detected. That remains the case whether you're using exceptions, error codes, Result or Either monadic types, or whatever.


Sounds very specific to server code


Not sure which type of "server" you meant when you said that, is that in the narrow sense of database server?

Behaviors similar to the above are not that infrequent, are expected from many other servers-in-wide-sense: media decoder would drop all decoding in progress and try to resynch to the next access unit, a communication front-end device would reset parts of itself and start re-acquiring channels (such exception-like reaction is even specified in some comm standards). Network processor would drop the packet and "fast-forward" to the next. Etc.

You could argue that this still looks like a server behavior loosely defined (and I agree), but a) this makes application field for exceptions large enough IMO, and especially b) how differently could one implement all that with other mechanisms (like return codes), and for what benefits?


> This is simply not my experience with exceptions. Exceptions are frequently thrown and almost never need to be caught, and the result is easy to reason about.

I write GUI apps and that is also how I use exceptions - and it works just fine. If you have an exception, the only rational thing to do most of the time is to let it bubble up to the top of the event loop show a warning to the end user or cleanly quit the program while making a backup of the work somewhere else.


>assuming you intend to throw it all away and rewrite once your proof-of-concept does its job

Once a throwaway works, it becomes production.


And this is part of why I never ever ever "write one to throw away". It's very rare that it actually gets thrown away and redone "properly".

Also I just don't want to waste my time writing something that's for sure going to be discarded. There's a middle ground between "write something held together with duct tape" and "write the most perfect-est software anyone has ever written". My goal is always that the first thing I write should be structured well enough that it can evolve and improve over time as issues are found and fixed.

Sometimes that middle ground is hard to find and I screw up, of course, but I just think writing something to throw away is a waste of time and ignores the realities of how software development actually happens in the real world.


This. Once the spaghetti code glued together to somehow work is deployed and people start using it, it's production system and next sprint will be full of new feature stories, nobody will green light a complete rewrite or redesign.


And that’s how you get a culture where severe private data breaches and crashy code are the status quo :/

We can do better. Why don’t we? I guess the economical argument explains most of it. I think if more governments would start fining SEVERELY for data breaches (with no excuses tolerated), we’d see a lot more people suddenly start caring about code quality :)


>We can do better. Why don’t we? I guess the economical argument explains most of it. I think if more governments would start fining SEVERELY for data breaches (with no excuses tolerated), we’d see a lot more people suddenly start caring about code quality :)

Governments care about the "economical argument" even more so. They don't want to scare away tech companies.

Besides, today's governments don't protect privacy, rather the opposite.


Sure. And if governments started shooting people for drunk driving, we'd have less drunk driving.

Some of us don't like the negatives even if we would enjoy the positives.


We got a green light for a complete rewrite, but only because of licensing issues with the original code. I'm just hoping we don't fall for the second-system syndrome.


There are exceptions of course. I have also been involved in some complete rewrites and green field projects to replace existing solutions but it's very rare. Happens much more often in government sphere compared to private sector.


Which is the mistake, the throwaway should test one sub system or the boundary between two sub systems and nothing more. To get tautological again, once you have a working system you have a system.


Yep, that was pretty much my point :). I think it’s a dangerous precedent to follow (bad practice), but I’ve certainly been guilty of it on occasion.


With that "works 90% of time" idea, please don't ever involve yourself in software for anything serious: air traffic control, self-driving cars, autopilots, nuclear reactor control, insulin pumps, defibrillators, pacemakers, spacecraft attitude control, automated train control, the network stack of a popular OS, a mainstream web browser, a Bitcoin client, the trading software of a major exchange, ICANN's database, certificate signing, ICBM early warning system, cancer irradiation equipment, power steering, anti-lock brakes, oil/gas pipeline pressure control, online tax software...


I actually do have some experience in that area - one of my early internships was in a consultancy that specialized in avionics, health care, finance, and other areas that required ultra-high-assurance software.

It is a very different beast. Their avionics division was a heavy user of Ada, which you will basically never find a webapp written in. There are traceability matrixes for everything - for each line of code, you need to be able to show precisely which requirement requires that code, and every requirement it impacts. Instead of testing taking maybe 5% of your time (as with startup prototype code) or 50% of your time (as with production webservices), it took maybe 80% of the total schedule.


Not working in those fields either, but i don’t understand how people can be comfortable writing life-or-death code in C either. Anything that doesn’t involve a heavy dose of formal proof or automatic validation of properties of your code seems irresponsible as well.


C is very safe if you are experienced and don't do anything fancy.

What else would you use apart from Ada? I wouldn't trust any language with a large runtime like Python, Java, and yes, also not Haskell.

C is very amenable to proofs that use Knuth's proof style. Also of course Frama C exists.

EDIT: If Rust is more mature, it may be an option, but I'd wait at least 5 more years until (if?) it is widely used.


ocaml with coq prover ?


But those are only a fraction of the software ever written.


A market subject to regulation would just move what’s considered the easiest 90%. Maybe in a small startup, one would write a fancy nonlinear or deep ML model in TensorFlow while for a regulated/compliance-oriented codebase, you’d stick to linear algebra for the ML model to guarantee convergence.


And this is likely why security folks will always have a gig.


Agree with this except for the last bit about “mature” software market. Software is just the execution of logical processes. There’s no reason to think we’ll run of out need for new logical processes to be implemented.


Errors and exceptions are really fairly different things, and I personally appreciate languages like Erlang that have both.

"Errors" are an instance of a (perhaps-implicit) returned sum-type of a given function. Calling a function that returns a sum-type, and then not having a branch for each instantiation of that sum-type, is almost always a logic error in your code, no matter what the sum-type is.

A pretty good generic solution to "error-handling" is the Either monad, as seen in its literal form in Haskell, in Javascript as a convention for async callbacks, and in Erlang as the conventional sum-type of {ok, Val} | {error, Desc}. The Either monad is nice because you can't really "lower" monadic code by ignoring the monad and just "getting at" the result; instead, you have to "lift" your code into the realm of the monad, which requires you to specify how you'll handle each of its potential cases. (Even if your specification is simply "I expect this; if it isn't this, crash"; or "I expect this; early-return anything else to my own caller.")

"Exceptions", on the other hand, are things that by default no component of your system knows how to handle, and which usually—other than some infrastructure-level components like HTTP error handlers and "telemetry" service libraries—code just allows to bubble up until it hits the runtime and becomes a process abort. These include "runtime errors"—for example, failures of memory allocation; and "programmer logic errors"—for example, dividing by zero.

In either of these cases, the "correct" thing to do is "nothing at all", because the system itself has no idea how to handle these at runtime. The only time when these problems could have been handled was at development time, by writing more or different logic such that these exceptional circumstances would never arise in the first place. Now that you're in such a situation, though, you pretty much just want to note down what happened (so the log can be sent to a developer, who can use it to understand what the novel exceptional situation was, and then plan a change that will prevent the exceptional situation in the future) and then die quickly, before your now-invalid runtime state can do harm to things like durable stored state.

Or, to put that another way:

• an error says "I failed to do that."

• an exception says "I have noticed, while trying to do that, that no valid system-state can result, even one of failure."


I don't think this alleged distinction between exceptions and errors is as clear-cut as you imply. I think this distinction is purely a convenience; a line we draw to make our lives easier because we don't really want to accept that extremely rare error cases do exist and should be reasoned about despite their rarity.

For example, you listed as some examples of exceptions that are not error: Divide by zero, failures of memory allocation.

Let's say you write a calculator GUI and I try to divide by zero in it. Exception or error?

If I applied your advise, I would have to deem this an exception, and just "just note down what happened" or "die quickly" (your words)! That is quite obviously wrong, in this example.

The correct answer would be to feed some kind of error code to the calculator's output data path, which would eventually be displayed on the screen.

Sure, I'm sure you'll come back and say now "well it depends then, and whether you consider it an exception or an error depends on the application". If you say that, you are ceding my point, because that is precisely what an 'error' (not an exception) is: a condition that may in some cases be fatal to the application, and in some cases be a part of ordinary runtime behavior.


I disagree with your case, this is clearly an error because it should never get to dividing by 0.

You MUST validate user input and user input validation failures are a class of errors not exceptions.

For example look at java (which uses the opposite terminology but exact same concept). Checked exceptions are expected state that should be handled gracefully, unchecked exceptions and errors are unexpected and the handling of them is generally let someone know what happened and exit quickly.


> I disagree with your case, this is clearly an error because it should never get to dividing by 0. You MUST validate user input and user input validation failures are a class of errors not exceptions.

This is bizarre: it sounds like you're disagreeing with me, by agreeing with me (that this is an error, not an exception)! For the confused (myself included), please realize that lost953's argument is considered "begging the question" (a logical fallacy [1]): You're using your a-priori assumption that 'divide by zero' is an exception, to argue that what we really should be doing here is add an if-guard before the "actual divide" occurs, so we can return an error, instead of an exception!

Otherwise, thank you for conceding that exceptions are just a category of error :)

[1] https://en.wikipedia.org/wiki/Begging_the_question


No you misread my refutation, I make no claims of the errorness or exceptionness of dividing by 0, merely that your proposed example doesn't support your claim that it depends on the nature of the application. I claim that unchecked user input falls into the category of 'error' and that clearly is what has occurred in your proposed example.


> I disagree with your case, this is clearly an error because it should never get to dividing by 0.

> No you misread my refutation, I make no claims of the errorness or exceptionness of dividing by 0,

Hmmm.


I understand them to be saying that if execution is supposed to be halted before the division occurs, then whether the division is an exception or an error is a moot point.


I think the point the GP is making is that your calculator might have a function like like

   calc_div(num: Number, den: Number) -> (Error | Number):
     if den ==  0:
        return Error("division by zero")
     else
        return num / den

Now this function pre-validates the data. But it might be used as part of a much larger system. Should that system be programmed to know that `den` should always be non-zero and pre-pre-validate accordingly? Or else should it leave "calc_div" to be the expert on the rules for division?

If you take the latter approach, then the caller has to have a way of accepting the error gracefully, as a normal thing that might happen. And thus we have a div0 that is an error rather than an exception.


Ah, but floating-point math is such fun! Here is how it might work...

den is a tiny non-zero value in an 80-bit register. It gets spilled to a 64-bit slot on the stack, rounding it to zero, while another copy of it remains in an 80-bit register. The 80-bit version is compared against zero, and is non-zero. The 64-bit version, which is zero, gets used for the division.

It is fully standards-compliant for a C compiler to do this. Some languages may specify otherwise, but often the behavior is unspecified or is implicitly the same as C.


For many use cases, it a bad idea for a calculator program to use floating point, rather some more exact representation.

However if you do use floating point, then the kind of dangers you point out make my point even stronger. You could conceivably embed the `calc_div` function in a larger system that knew about pre-validating for div0. But if you want to deal with all possible sources of FP weirdness when doing division, then you really need to concentrate it in the "division expert": i.e. have calc_div pre validate all that stuff, and have its caller accept that errors are a normal result.


“Divide by zero” is an exception in the math library. But a GUI calculator shouldn’t pass all input to the math library without parsing it first, and figuring out what the user wants and if the user has entered a valid statement.

One guideline I follow for choosing between error codes and exceptions is “can the immediate caller do anything about this issue?” Even if the calculator does feed user input directly into the math library, the calculator can’t do anything sensible with an invalid statement. The calculator will have to “go up a level” and ask the user for help.


Disagree with validating that sort of input in code you (the non-framework/BCL author) write; let the operators, functions, etc., do their own work of deciding if their operands are acceptable. Otherwise, where does it end--do you pre-test two integers to verify their product won't overflow the type declared for their product? I think you gotta let the exception happen.


No: as I said above, an unhandled error generates an exception. But you shouldn't be able to have an unhandled error—good type systems prevent you from compiling such code, by treating all the errors that a function can return as cases you are required to handle at the call-site. Maybe generically with a default case, but you've still gotta handle them.

Dividing by zero is an exceptional situation (at the level of the processor, even!), rather than an error, because most divisions have known, nonzero divisors; certainly, all intentional divisions do. If you do end up telling the processor to divide by zero, it is assumed that you fucked up and didn't write your program right, because user-supplied divisors are a very rare use-case compared to known-domain divisiors, and a known-domain divisor being zero is indeed programmer error.

But, even if cases where the user controls the divisor are comparatively rare, they do exist. So, even if the processor isn't going to help you, why not have the compiler generate a check for them, such that integer division would be an (Either Integer Error) kind of operation? Well—performance.

Integer division—in languages where it generates an exception—is a low-level primitive. It's meant to be fast. (In fact, all integer operations in such languages are meant to be fast single-CPU-instruction-equivalent primitives; this is also why e.g. overflow isn't checked on integer addition.) Compilers for low-level languages are expected by their users to not generate any protective code for these primitives, because that protective code would get in the way of using the primitives at the highest efficiency possible. 99% of the time, the domains of these operations are fixed at the business level, such that these errors cannot occur. And, the other 1% of the time [like when writing a calculator program], the users of these languages use "safe math" libraries built on top of these primitive operations, rather than the primitive operations themselves.

Let me put this another way, with a concrete example.

In Rust, in safe code, you can't dereference arbitrary pointers, only opaque abstracted pointers (references) given to you by the runtime, that are guaranteed by the compiler to have non-null [direct] contents. You can make the inner type of a RefCell an Option<Foo> instead of just a Foo, and then it can be null... but this kind of null is an instantiation of the Maybe monadic sum-type, and so the compiler can notice when you aren't checking for its existence and stop you.

But in Rust, in unsafe code, you can dereference an arbitrary pointer, and the result can be null. What should happen when you do so? Well, that's an exceptional situation. You asked for unsafety, and then you did the stupid thing you weren't supposed to do. There's nothing that can really save your code now.

Before the invention of "exceptions", we just called these types of errors faults. Protection faults, for example. Your code would just be killed, without the ability to do anything, because it just did something that broke the semantics of the abstract machine (the process) that the OS had the program boxed up in. The OS might be kind enough to core-dump your process's memory in the process of killing it.

Exceptions are still faults. They just let you do arbitrary other stuff as your process comes tumbling down. There's nothing you, or the runtime, can do to "save" your process, when the "error" in question is precisely that you first asked your compiler for unsafety—asked it to not prevent your code from compiling if it does something invalid—and then you went ahead and used that unsafety to do something invalid.

In Erlang land, we have a third thing: exits. Exits are faults on a small scale—they kill the actor-process that generates them, and then spread across its linked siblings and parents to kill those too, until/unless one of them is "trapping exits", at which point that actor-process will receive the exit as an informational message from the runtime rather than just dying. Most actor-processes don't trap exits; and, in fact, you can't trap an exit generated by your own actor-process, only an exit "echoed" to your actor-process from below/around you. And the only processes that do trap exits, don't attempt to "save" the actor-process that is dying. Unlike with exception-unwinding, by the time an exit is "heard" by another actor-process, the actor-process that emitted the exit is already dead. The point of trapping exits is to decide what to do after the actor-process dies—for example, starting another one of the same actor-process to replace it.

(As it happens, POSIX exit statuses fit this concept as well, though semantics like bash scripts not being `set -e` by default kind of screws that up.)

Exceptions have their place—they describe a fault-in-progress, and let arbitrary things happen as a fault causes a stack to unwind, before the process dies. Exits also have their place—they let processes react to other processes having faulted. And errors have their place—they're a branch of the Either monad that a compiler can force your code to handle.

Personally, I don't see what confusion there is between any of these. They're roughly orthogonal.

And none of them represents "something bad happened from this function call. Somebody up there on the stack, help me recover, please!" (Those would be Lisp conditions, but nobody uses those.)


If you think of it more like “how do I want to handle something bad happening?” instead of “what category does this fall under?” then I believe electrograv‘s point of not being clear-cut becomes more clear.

For example in rust most code that can fail will return a Result and thus the compiler forces you to handle that. However, that code can just as easily panic and behave like an uncaught exception would (thread exiting). An example would be the division operator and the array index operator. Both division-by-zero and out-of-bounds errors can certainly be handled by using a Result but in this case the Rust developers made a decision to use panic. Are these both exceptions bc they are handled like a typical uncaught exception or are they errors bc it’s conceivable to handle them just like a failed file open?


(Those would be Lisp conditions, but nobody uses those.)

Off-topic, but why does nobody use those? The idea of "I have an exception, I'm the best place to act on it but the worst place to decide what action to take, request for advice" sounds good, at least


> Before the invention of "exceptions", we just called these types of errors faults. Protection faults, for example. Your code would just be killed, without the ability to do anything, because it just did something that broke the semantics of the abstract machine (the process) that the OS had the program boxed up in. The OS might be kind enough to core-dump your process's memory in the process of killing it.

Does your definition of a “fault” mandate that the process be killed? I’m sure there are architectures where forcing this to occur would require checks.


A "fault" is usually a term for a trap/kernel-mode interrupt that results from a check at the hardware level, built into an instruction. For example, a protection fault occurs when the processor attempts to access a memory address that isn't currently mapped.

There might exist ISAs without any equivalent to faults—but I would guess that the semantics of any "faultable"-instruction-equivalent on such an ISA, would 1. actually take the slow path and use a flag register to return the status of the execution; and 2. would thus require C and C-like languages to generate shim code that e.g. checks for that flag result on every use of the dereference operator, in order to generate a userland-simulated "fault"-equivalent (i.e. an abort(3)). This is because these languages don't have any way to capture or represent the potential flag-result return value of these operations. Low-level languages expect dereferencing to raise exceptions, not return errors. There's no place for the errors to flow into.


The reason I’m asking is because we seem to be a in a conversation where we are redefining words and tying them to abstract concepts, so I wasn’t sure if by “fault” you meant “this means the processor literally faults” or “I’m going to call a fault the termination of a process when it does something illegal, named because that’s how processors usually work today”. From your response, it seems like you’ve taken the latter interpretation.

> Low-level languages expect dereferencing to raise exceptions, not return errors.

Are you talking about C? Because it doesn’t actually define anything in this case. Sure, on most OS/architecture combinations, it will trigger a memory protection violation, but this is not guaranteed and I’m sure that in these cases the compiler doesn’t literally insert checks before dereferencing anything to guarantee failure.


> Are you talking about C?

Yeah, the "C abstract machine" that most C-like languages rely on by the fact of relying on libraries like libc.

> Because it doesn’t actually define anything in this case.

To be clear, when I said expect, I meant expect. These languages expect invalid dereferences to fault—i.e. to cause control flow to be taken out of the program's hands. Which is to say, they expect a generated dereferencing instruction that returns to have put the system into a valid state.

But in practice, what that means is that compilers just expect dereferences to always be valid (because any dereference that returns is valid.) So they don't have to generate any check for null before doing a dereference; and they don't have to worry about having any way to represent the result of a null dereference.

Another way to say that is that dereferencing null pointers is implicit undefined behavior. There's no part of the spec that covers what would happen if you dereferenced a null pointer, because in the model as the spec lays it out, dereferencing is a happy, clean little operation that never goes wrong. (i.e. "The MMU might cause a CPU fault? Who cares! My program didn't cause that fault; it was the MMU's choice to decline to read from $0. Other MMUs could totally allow that, and then LEA would be defined for the entire domain.") Same goes for reading from unmapped regions—"it's the MMU's fault; it could have just as well returned 0, after all."

> I’m sure that in these cases the compiler doesn’t literally insert checks

Yep, you're right. Again, it's because this is what these languages expect. It's not quite the same as undefined behavior; it's that the "C abstract-machine evaluation model" was defined to assume that architectures will always have their IDIV-equivalent instruction fault if invalid. If it returns a flagged value on some arch instead, that's something the C evaluation model is not prepared to deal with.

(Technically, I believe that you'd say that a C compiler "cannot be written" for such an arch while retaining conformance to any existing C standard, and for such an arch to gain a conformant C compiler, a new C standard draft would have to be written that specifies an evaluation model for such architectures—as then the compiler could at least be conformant to that.)

Which is helpful to know: just compiling C for such an arch at all, is undefined behavior, no matter what you write ;)


I’m not sure I follow your description of undefined behavior, which seems to me to deviate from the standard’s? Correct me if I’m wrong, but to me it seems like you’re saying something along the lines of “the C standard ‘expects’ that null dereferences fault, and ‘assumes’ division by zero to trap; it is impossible to write a standards compliant compiler for architectures where this is not true”. If so, let me know an I’ll explain why I disagree; otherwise, it would be nice if you could clarify what you actually meant.


The error bubbling up is a good thing. It at least allows you to find missing error checks in the underlying code. Good luck trying to find an unchecked return code. It seems you mix up problems innate to exceptions (like messing with the control flow) with user problems (unchecked errors/exceptions).


> Good luck trying to find an unchecked return code.

We're talking about alternate programming languages that handle errors vs exceptions differently, so it's only fair if we consider a language designed from the start not to need exceptions.

So let's take Rust, for example: In Rust, it's not a matter of luck at all: You declare whether your return code may be ignored; if not, it's a compile error not to use it.


Fair enough, error checks should always be enforced. I think errors and exceptions are both valid concepts that solve overlapping problem areas. Java has checked exceptions, so you can statically enforce exception handling.


In c you can declare a function with __attribute__((warn_unused_result))


Not even used for the C standard library and that would lead to warning spam as every printf call could fail and if there is anything I have never seen it is C code checking the return value of printf.


You would only mark functions where it’s a bad idea to return the error code that way. Printf isn’t a function like that, and fprintf probably isn’t (use case of printing to stderr), but fwrite is.


... “bad idea to ignore the error code” ...

It took a few days to notice.


> C standard library

is an mostly an ancient POS


Nicely and clearly put. Doesn't really matter if it is merely a convenience as @electrograv said, i find this a very pragmatic way of looking and handling at the matter.


Using exceptions isn't "deferring technical debt".

Whether you use exceptions or not, you have to do all these things: detect errors, unwind the stack to a place where the error can be dealt with, and clean up resources during the stack unwind.

Exceptions are a control flow tool that simplify these things, nothing more or less.


Exceptions-as-alternate-control-flow is a paradigm that shouldn't have become mainstream in the first place. Using Exceptions (in effect forcing the addition of a new control flow path) unless where absolutely necessary has hurt the field, it's just like null. I much prefer the errors as values, and I think people are coming around to this point of view more and more recently, as languages are being retrofitted with type systems that are good enough to cope.

When I first saw Optional in Java (long before I'd ever done any meaningful work with languages like Haskell or Rust), I thought it was weird how it seemed to seep everywhere and be a good idea to use everywhere. This seemed off/weird to me at the time, but now I recognize it as the Blub paradox[0] (I'm not a pg worshipper but this is one of the most insightful things I've read of his IMO). The way I was thinking was wrong, at least from an overly pedantic sort of view -- failure is everywhere in Java due to nullable types, people have just been conditioned to pretend it isn't.

Nowadays I don't choose languages for software I write with nullable types not checked by some compiler -- so using Typescript when I do JS, or using Haskell or Rust.

[0]: http://wiki.c2.com/?BlubParadox


I think Midori might have gotten error handling right, without getting rid of exceptions. It's too bad it only lives on in a series of blog posts.

I wish I could summarize the error model here, but I really couldn't do it justice. Read the blog post -- it's very good.

http://joeduffyblog.com/2016/02/07/the-error-model/


I have always heard that "Exceptions" should be "exceptional". Meaning that most expected errors (eg: getting a non 200 response to an HTTP call, having a file not found when opening one, ...) should not be handled in exceptions, but in regular if...else blocks instead.

Exceptions are great for exceptional stuff we really could not have expected (or just don't want to deal with so we want to cleanup nicely), but they tend to be overused for "anything that is not the expected result.

On the other hand, deferring non expected path to later using exception like you clearly state sometimes actually is good thing as it speeds up development time significantly. Depending on the project, you may actually really not care, and having that power in your hand in invaluable.


I agree with this. Error handling code and exceptions are not mutually exclusive, and both have their benefits and drawbacks. Checking return codes in actually exceptional circumstances makes the code base almost unreadable, and exceptions work amazingly for this. On the other hand, using exceptions for common errors is both inefficient and ugly.

As an example of useful exceptions: malloc errors. In C, handling malloc failures is a nightmare. So much so that very few programs actually do it properly. The reason for that is that not only do you need to check the return code of malloc, you need to check the return code of every single function in your entire program that ever calls a malloc anywhere down the line.

So while this piece of code might be nice if you see it in one or two places:

int rc = fx (); if (rc != 0) { handle_error (); }

Properly handling malloc errors means every single function call becomes a four line monstrosity, basically blowing up your code base by factor four and making the code much more difficult to read. Not to mention how insanely error prone it is to have to check the return code of every single function. When doing the wrong thing is so much easier than doing the correct thing, most people will do the wrong thing.


This is an API design issue. The proper solution is to provide two versions of malloc or equivalent - one that doesn't have any error code, and simply panics on failure to allocate, and another that provides a way to recover. A typical app would then mostly use the first version, and very occasionally the second when it anticipates that allocation might be so large that it could fail even in a healthy environment (e.g. loading an entire file into memory).


Panicing is not handling anything, it is just crashing the program. Unless you can catch the panic, in which case it's just another name for an exception. Not handling failed allocations except for on large allocations is just asking for rare crashes when one of the "unlikely to fail" allocations fails. Neither of these solve the problem in a robust way. Handling this properly is a language design issue. This is just applying a band-aid.


This is just recognizing the status quo. Pretty much no desktop or mobile or web software is handling "unlikely to fail" allocations. And how exactly do you expect them to handle it? If, say, it's a desktop app that's rendering a widget, and as part of that it needs to allocate a string, and that fails - how does it recover?

Panic on OOM is perfectly reasonable for most.

And yes, panics shouldn't be catchable. That's the whole point.


If you use a language designed from the start not to need exceptions, this problem is solved: The code in fact looks very much like code that uses checked exceptions (in fact, another commenter even mentioned Rust’s type system is isomorphic to checked exceptions).

I have no problem with checked exceptions. Unchecked exceptions is another story, and forces you to “catch all” everywhere (most libraries don’t document every single possible exception of every single method).

In retrospect I don’t even know why everyone responded to my original post about focusing on software error handling as if I was attacking exceptions. I think unchecked exceptions and unchecked null pointers are bad, but that’s about it; and that’s not even what my top post was about.


I would rather get rid of the term "exception" at all, and instead talk of recoverable errors (which should be properly reflected in the type system, as in e.g. Rust), and contract violations (which should result in an immediate panic).


> On the other hand, deferring non expected path to later using exception like you clearly state sometimes actually is good thing as it speeds up development time significantly.

Personally, I like the idea of Java's "checked" exceptions, particularly when your software involves lots of layers that need to use one-another appropriately, including adapting to error situations.

If you go "quick and dirty" with unchecked exceptions, you have the option to change them to checked ones later. When you do that a lot of stuff will "break", but really that will just be the compiler walking you through all the paths the exception could take, asking you to make a decision at each spot about what you want to do. (Pass the exception up the stack unchanged, catch it and wrap it and throw the wrapper, catch and recover, catch and ignore, etc.)


My argument: 'Exceptions should be exceptional', because exceptions are really just errors, that happen to be exceptional (rare)!

The reason exceptions tend to be overused is because the line between error and exception is blurry -- and it's blurry precisely because an 'exception' is really just a subset of errors. Unfortunately, most languages do not treat exceptions as a subset of an error, but as a completely disjoint/orthogonal thing, and that's the problem!

If we transitioned to languages that handle these concepts in a unified way (and ones that don't allow unchecked exceptions), this isn't a problem at all, and we can all happily write much more inherently reliable software.


> If you're not spending at least half your time on so-called 'error handling', you're probably doing something wrong, like using a language feature (like exceptions) to defer that technical debt to later -- and you'll regret it, if your project matures, I assure you.

Clarification please--are you suggesting the programmer can anticipate all reasonably likely error conditions and implement handlers for each?

I lean towards treating recoverable errors as uncommon, preferring a log->retry->give up strategy in most cases. The biggest sin is often in obfuscating the source error or prescribing a bogus resolution. Exceptions, while not perfect, remain a pretty good way to convey what was going on when things went sour.


> Clarification please--are you suggesting the programmer can anticipate all reasonably likely error conditions and implement handlers for each?

Absolutely, insofar as it is possible for a programmer to write their application in one of the several existing programming languages that can guarantee at compile time that there is no undefined behavior / exceptions / crashes / memory corruption. This concept is sometimes referred to as a "total function": a function that is defined for all possible input values, and in a language for which there is no way to invoke that function on an input not in its statically-declared domain.

Now, I'm not saying this is possible for all programs, particularly for certain kinds of I/O, but with a reasonable amount of error handling code and robust interfaces between that I/O device and your code, it's usually possible to get pretty close to perfection there too.

I'm also not saying it's easy. But I think this is precisely the kind of "hard" the software engineering industry needs more of right now; and languages like Rust and Zig and even Spark (Ada verifier) are a wonderfully refreshing move in that direction. Note: Not all of these languages I mention are 100% perfectionist either! What's important is they move significantly closer to perfection, and away from this 'cowboy coding' mentality of not really caring about errors/exceptions until they bite you.


Really puzzled that you have conflated exceptions ("'x' went wrong, here's the stack") with undefined behavior, crashes and memory corruption.

It seems as though you're describing how the world ought to be, and maybe could be, for the writing of pure functions, procedures with provably no side effects, and the like. Well, OK.

However, imagine that you're writing a few bytes to disk. Maybe the device runs out of space halfway through the write, the filesystem was mounted readonly, or the filesystem doesn't like the filename you asked for due to length or case-insensitive overlap with an existing name, or the controller driver glitches out, or permissions are wrong, etc., etc. You cannot anticipate all of these and implement recovery, aside from informing the caller what went wrong, and giving an opportunity to somehow correct the problem elsewhere and retry. Well now you're in the business of not really recovering from the fault, but instead doing something the same or nearly the same as raising an exception and printing the stacktrace.

TBH I have misgivings about even writing up that last paragraph because the alternatives really strain credibility.


I don't think the parent was suggesting that there's anything wrong with what you wrote in that paragraph. At least IMO that counts as handling the error, vs. just doing nothing and assuming everything is ok. If you have to kick it back to the user and say "sorry, XYZ happened, and there's nothing I can do about it", that absolutely counts as handling the error. Not handling it would be just assuming the file got written properly to disk and then later having to deal with corrupt data.


You could also argue that relying on "total functions" is an antipattern because it doesn't account for "undefined behavior / exceptions / crashes / memory corruption" and leads programmers to overreliance on the compiler. We can throw race conditions and deadlocks in there too, since very often it will be difficult for a compiler to detect those at compile time. A PL that gracefully handles programming errors as well as exceptions, memory corruptions, etc, emitting a logged problem and continue to chug along will let the programmer decide how bad an error is, with high uptime, and make a business decision about whether or not it's worth fixing at the time.


I'd even go so far as to say: If you're not spending at least half your time on so-called 'error handling', you're probably doing something wrong, like using a language feature (like exceptions) to defer that technical debt to later -- and you'll regret it, if your project matures, I assure you.

If you spend more than half your time in error handling, you have an architectural design issue with your code. If you need to be that vigilant then it's too easy for a junior hire to make a mistake that makes your code unreliable. Design out pitfalls. Reduce the amount of code in your system where you need that level of vigilance, and you'll no longer need to spend 50+% of your time on error handling.


Designing out pitfalls is precisely what I’m talking about and arguing for — and that takes time (in an application of any sophistication), either because you’re using a language (like Rust) whose compiler nitpicks your code in ways that 99% other languages would just accept without complaint, or because you’re being equivalently paranoid in your design of every single core data type, interface, platform service, etc.

There may also be some disconnect here in the type of code we’re talking about. I’m talking about work on code that millions of people are relying on to be 100% rock solid. I don’t care if you’re junior or senior: when writing code at this standard of quality, caution and rigor are always part of the process (both by the author, and the review and testing process).

If you can write absolutely rock solid, 99.9999% bug-free code that handles every possible error case gracefully, while spending more than 50% of your total coding time spent typing new code (which apparently has no error handling) literally as fast as you can type, well... WOW!! Consider me impressed; It seems we all have a lot to learn from you. If so, I genuinely would love to learn more of this seemingly-magical process where you can write perfect code at maximum speed, while also not having to think about edge cases or other errors.

Anyways, back to reality:

The fact that so much of this caution associated with writing bug-free code is loaded onto human judgement right now is exactly why I’m such a strong advocate for languages like Rust and Zig that aim to move much of this cognitive burden into the compiler.

For example, let’s talk about designing out pitfalls: say I create an immutable data structure in C++ with a really efficient implementation (zero-cost copies, automatic memory sharing) that is virtually foolproof when accessed from multiple threads, used in computations, etc. No matter how foolproof I make this C++ class, I still can’t stop your “junior dev” from stomping over the end of another array into my class data, or using dangling pointers, etc. etc. etc.

We can enforce “safe” classes wherever possible, but that also runs up against the wall of reality when interoperating with other C++ code that has a different idea of what constitutes that “ideal C++ subset”.


_I don’t care if you’re junior or senior: when writing code at this standard of quality, caution and rigor are always part of the process_

There are (at least) three stages for a program: make it work, make it good, make it fast. Most programmers I know (aka juniors) are happy to get the first stage done, for a value of done (it worked on my machine, when I tested it with this particular sequence). After many, many years of programming (aka senior), I'm proud to say that I usually take care of the second stage, and sometimes even the third.

The whole point of the junior / senior distinction is that seniors have been burned more times by some things so they insist on processes like always having source control, at least trying to have automated tests and so on - processes that a lot of juniors find irrelevant to getting things to work in the first place.


What kind of "junior" are we talking about here ?? The average comp. sci student will insist on source control. Are you hiring high schoolers or what ?


> If you spend more than half your time in error handling, you have an architectural design issue with your code.

Ha, I once wrote a lock_file(...) function in 30 minutes. Then spent the next two years getting it to work on various platforms, on network drives, dealing with re-entrance, and so many other corner-cases you couldn't believe. The initial 10 line function turned into a whole module.

I'm not sure that counts as error handling as much as handling things you don't expect.


There's no such thing as a language with no exceptions.

There are only languages where exceptions need to be hand-rolled in an ad-hoc fashion.

This ad-hoc approach does not work. It gives you tiny benefit of not needing to learn a language feature. Meanwhile, ad-hoc exceptions mean that you lose all sense of modularity in your program. Real programs are composed of dozens of modules at various levels of abstraction, communicating together over multiple processes and threads. Ad-hoc exceptions in a real program in practice means 'just crash the whole thing, yolo'.

That does not work for serious programs.


> There are only languages where exceptions need to be hand-rolled in an ad-hoc fashion. This ad-hoc approach does not work.

C has no exceptions. It seems to be quite successful.


Well no, every time you have a chain of functions in your call stack that all do

    if(f(blah) != E_OK) {
      return E_WHATEVER;
    }
(which is veeeery common in C) you are reimplementing exceptions by hand (and with less performance in the no-error case since you still pay for the branches, while a C++ code with exceptions would just be a sequence of function calls without branches in that case)


I wouldn't disagree with you. The OP said that programming languages w/o exceptions aren't successful. C doesn't use exceptions, it has it's own way of dealing with similar issues, and yet it's quite successful.


>I'd even go so far as to say: If you're not spending at least half your time on so-called 'error handling', you're probably doing something wrong, like using a language feature (like exceptions) to defer that technical debt to later -- and you'll regret it, if your project matures, I assure you.

That is a big if. Most projects don't mature that much. And for those that do, regretting things after the project has matured is a "nice problem to have". E.g. even if Zuckerberg has regretted some bad early FB design (let's say regarding error handling), he's is still a billionaire.

Second, you can handle or ignore errors with exceptions just as well as with any other mechanism (multiple return values, optionals, etc).


Most error handling has only one possible way of handling: your activity fails to complete.

That's why exceptions make sense. I don't care if I ran out of disk space, a file is corrupted or what else, the result is the same.


As someone who has spent years writing motion control code, I agree wholeheartedly.

Thinking of code I've written to move an object from one physical position to another and in pretty much every case, the error handling and recovery paths are the bulk of the code.

Errors in motion systems are not rare events. Things get sticky, wires break, sensors get clogged with dirt, parts wear out and break off and throughout it all, this thing has to keep moving from A to B reliably or at least clearly indicate when it's failed unrecoverably before something worse happens.


I can follow your premise that not enough time and effort ia out towards making sure our programs are correct and have low defect. I'm not sure I follow your jump to claim Rust and Zig solves that problem.

But I guess time will tell, when there will be as program running in those as there are C/C++ programs, we'll see if things are any better or not.


Back in the day I used to work on low level telecommunications code. Often there the happy path was treated as (almost) an afterthought and handling all the error paths was the focus of the work. It was possible to work that way using C or in the various assembly languages we used.


I think exceptions are kind of cursed by their name. To treat exceptions correctly you have to constantly keep in mind that they are misnamed. And you have to deal with type systems that do not treat them as precisely as they treat return values. And on top of that you have to coexist with people who impose all kinds of semantic limitations on their mental model of exceptions, such as that they have to be "exceptional." Exceptions have to be treated as happening all the time, and in strongly typed languages (such as Scala, my hammer) you have to keep in mind that the type system has this huge loophole built in that is very important and that the type system gives you very little help with.

(The most obvious example that utterly confounds semantic limitations on exceptions is opening a file. Programmers accustomed to working on a certain kind of system find it quite natural to regard a missing file as "exceptional", even fatal — for them a missing file means the install or even the very build is corrupted. Programmers who work on a different kind of software may regard missing files as completely routine, maybe a commonplace result of user error. If these two groups of programmers believe exceptions are "exceptional" they will never agree on the contract for opening a file. Another example is deciding whether something that happens 0.0001% of the time is exceptional. Some programmers will regard that as exceptional while others will consider it utterly routine and regard it as a failure of discipline to believe otherwise.)

(The logical consequence of insisting on "exceptionality" is that you need two sets of basic libraries for everything and may need to mass refactor your code from one to the other as your use cases change. This is a needless imposition that offers no recompense for its offensiveness to good taste and good sense.)

The great merit of exceptions is that they remove boilerplate and make it trivial (nothing is more trivial and readable than no code) to express "I am not handling this, someone else must" which makes it quite easy to e.g. signal "400 Bad Request" from deep within some code that parses a request entity.

Personally, I think that for now it is best to prefer exceptions for writing lots of useful code quickly and concisely and to prefer strongly-typed, expressively-typed return values for achieving a higher level of care and reliability. But I look forward to a better linguistic solution that combines the virtues of both these approaches — and I have to admit that in my ignorance of many important languages I may be overlooking an already existing solution. I am reminded that in the early 2000s I would have relegated all static typing to the "verbose boring highest reliability required" category and then type inference and other ergonomic advances converted me to statically typed languages such as Scala as the best way to pump out lots of valuable working code. I'm looking forward to a linguistic solution to this problem.


Rust's Result type is isomorphic with checked exceptions, FWIW, just a bit more explicit in its use.


It's also more flexible and composable, since checked exceptions aren't quite a proper part of the type system. Consider the case of a higher-order function that wants to derive its exception specification from the function it wraps.


Eh. Very easy to model as a second return type on the function type. If your function type is polymorphic, it's just another type argument.


You also need union types. But yes, you can make it work (the original lambda proposal for Java tried to do that). Of course, once you push it to the point where it does, then it is practically indistinguishable from having a single return value that is a discriminated union.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: