Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not following you. I'm thinking of things like https://github.com/prometheus/prometheus/issues/482 where core elements of a language don't behave in a predictable manner (in this case == has two distinct modes). That's a simple example, more structural issues can be compound by workarounds and lead to a lot of complexity growing over time.


In order for DSLs not to evolve into a mess they must be made of high quality semantic components, which you simply mix together and alter syntax a bit in order to get a new DSL.

When you develop your DSLs this way the problems you're talking about are nearly impossible to run into.

Essentially, if you build DSLs on top of an elaborate hierarchy of DSLs, every single DSL adds only a tiny bit of functionality which makes it maintainable and protects from evolving into a pile of messy layers.

If you have any questions about technical details of this approach - feel free to ask.


> they must be made of high quality semantic components, which you simply mix together

> the problems you're talking about are nearly impossible to run into.

If you mean this in any non-fluffy sense, such a property would depend upon the set of semantic components having some incredibly strong properties.

It's not clear to me that those properties don't immediately conflict with the entire point of creating DSLs -- that the host language's semantic model isn't the one you want.

You can probably get trivial properties about the static semantics without a ton of machinery, and more dynamic properties might be possible to read off from very constrained examples. But it's entirely unclear to me why semantic models should easily combine without creating unintended behavior.

Could you share some concrete examples?


> such a property would depend upon the set of semantic components having some incredibly strong properties.

Why? You can have as many semantic building blocks as you like. My current tool belt contains, among the others, a generic high performance imperative backend, generic functional backend, generic typing engine (which easily include various Hindley-Milner implementations as well as simple type propagation forms), generic lazy functional, generic Prolog and Datalog - these are the low level semantic building blocks which can be mixed together in any proportion.

On top of them there is a large hierarchy of high level building blocks, which include, for example, generic optimisation components (e.g., IR-agnostic SSA-based optimisations), generic parsing frontends, data representation and transformation languages, and many more.

> But it's entirely unclear to me why semantic models should easily combine without creating unintended behavior.

Why would they? As long as the interop rules are clear, there is no problem in mixing different things together.

> Could you share some concrete examples?

Take a look at my github repo (linked in some other comments, do not want to keep spamming it).


> Why?

Because I care about soundness.

And especially soundness for combinations of under-specified semantic models, because eDSLs are useful exactly when the semantic models at hand aren't 30-60 years old with mature implementations.

> imperative backend, generic functional backend, generic typing engine (which easily include various Hindley-Milner implementations as well as simple type propagation forms), generic lazy functional... which can be mixed together in any proportion.

That's just so totally not true. There are a ridiculous number of ways to combine these things in ways that totally kill type soundness, confluence, etc.

The reason you didn't have a hard time combining them is because they are 1) really truly extraordinarily well-understood and well-defined languages; 2) we've known exactly how to combine these things for a really long time; and 3) you knew a priori the reasonable interaction points.

None of these three assumptions are true for most of the eDSLs engineers might want to write down in general.

In general, a framework that lets you get unsound bullcrap from two perfectly sound things isn't a solution to the (e)DSL problem.


> Because I care about soundness.

And what's the problem with soundness? I can even derive formal proofs for simple DSLs when I have too.

I never had any issues with mixing semantic properties of DSLs on any level of abstraction - and the higher the abstraction level, the easier it is to mix without caring at all about the underlying semantics.

For example - I've got a generic PEG frontend language. I can mix it into nearly any combination of execution semantics, because it's high level enough, and it has an intermediate representation which can be trivially mapped into pretty much anything at a very little cost (because of simplicity of requirements). Why should I care about soundness of the whole mix while I can easily prove correctness of this very last step of translation, and this is enough to ensure the rest.

> There are a ridiculous number of ways to combine these things in ways that totally kill type soundness, confluence, etc.

Then do not combine these things in the unsound ways. As simple as that.

For example, this is how I'm using Prolog embedded in an eager somewhat-functional host: the host DSL is used to define transforms over ASTs, including simple flattening transforms that derive sequences of equations (i.e., Prolog terms). Then the Prolog code is used to ask questions about these systems of equations. This way many typical internal compiler tasks are dead simple but yet reasonably efficient (and for some of the most important things I can even use Datalog instead, with all its cool optimisations). The only point where the Prolog world is interacting with the rest of the system is via these lists of equations, so I do not have to care how to reason about WAM interaction with a memory-managed eager functional semantics, all is well compartmentalised.

> None of these three assumptions are true for most of the eDSLs engineers might want to write down in general.

How is it so? Everything is still boiling down to one of the "fundamental" execution models. Higher level semantics are added as sequences of well-understood, provable, trivial transforms all the way down to such fundamental blocks, for which, as you said, we have at least 30-60 years worth of understanding.

The very high level semantics are added independently of any fundamental blocks, with a transform to a specific execution model being added at the very last moment and it is always trivial enough.

> a framework that lets you get unsound bullcrap from two perfectly sound things

If the source language is sound, each of the target languages are sound, and all the transforms in between are sound, then the result is sound and robust.

> isn't a solution to the (e)DSL problem

In my practice it is a solution indeed. Every eDSL I design is dead simple in implementation and very rarely I have to add something new to my current toolbox. With problem domains ranging from 3D CAD applications to hardware design.


> And what's the problem with soundness?

You can definitely not provide any guarantee for arbitrary DSLs implemented in your system, but claim you do.

> I can even derive formal proofs for simple DSLs when I have too.

Am I looking at the correct thing on GitHub? CombinatoryLogic? I can't find any code here that would suggest you're interfacing with a theorem prover, much less the incredibly amount of machinery that must be involved in taking two correctness proofs and producing a correctness proof for their combination.

> Then do not combine these things in the unsound ways. As simple as that.

I think this pretty much sums up the contribution here. It's a framework that doesn't face any problems with compositions because you've chosen not to write down things that don't compose well.

> If the source language is sound, each of the target languages are sound, and all the transforms in between are sound, then the result is sound and robust.

To repeat myself once again, your framework doesn't force any of these to be true. It's just as easy to write down unsound things as to write down sound things. The fact that you choose to write down sound things is nice I guess, but it's not a demonstration of the capacity of the framework or methodology to preserve soundness. It's just a demonstration of your own capabilities... we hope.

That works fine and well when DSLs don't get inter-leaved in unexpected ways and are only developed by a small set of developers. But then, we've known how to design DSLs in those cases for a long time. There's no REAL composition going on -- just the sort of combination that's always happened.

The actually interesting question is how to support an open world of DSLs. Where composing DSL1 with DSL2 won't break some guarantee that the writer of DSL2 worked hard to achieve. And where even changes to DSLA, which DSL2 depends on, also won't break those guarantees. And where the authors of all three languages never even know each other exist. And not just "if you don't write down the wrong thing", but "because the framework prohibits it". You don't solve this problem. In fact, I can't even figure out how to state some specification of a DSL, much less that it's preserved under a transformation...

Again, your framework might be a great setting for doing DSL engineering. And I don't doubt that you designed a tool that's useful for you. But you're making some pretty extreme and apparently over-stated claims beyond "nice framework solving a few practical implementation problems".

(Which is a shame because your tool seems nice, but your comments come across (to me) a bit snake-oily, so I'm not sure how seriously I should take the tool / approach. I think if you re-stated your claims so that they're a more accurate representation of the problems you do (and don't) solve, you might get a better reception.)


> You can definitely not provide any guarantee for arbitrary DSLs implemented in your system, but claim you do.

No, I never claimed so. It's a responsibility of a DSL developer, although it is very easy, because of the small size of any practical DSL implementation.

> I can't find any code here that would suggest you're interfacing with a theorem prover

I did not publish yet a large chunk of work which contains a statically typed version of the host language with an ACL2-like inference engine. It's still quite experimental, but I did many mechanical proofs manually in the past (mostly in the hardware-related DSLs).

> much less the incredibly amount of machinery that must be involved in taking two correctness proofs and producing a correctness proof for their combination.

I still cannot understand your point. Why exactly a combination of DSLs is going to introduce any additional complexity?

I demonstrated in my examples that mixing the DSLs is exactly the same thing as using them. There is a dataflow dependency between various semantic realms, but it is really hard to break any constraints by merely feeding valid data in and getting a valid data out.

So, yes, no existing language workbench enforce language soundness, it's up to the designer to ensure the correctness. But, can you name a single general purpose language with an enforced soundness, CompCert aside?

> get inter-leaved in unexpected ways

Mind providing any examples of this? I cannot think of any interesting case, due to the very nature of this approach, there is no difference between using the languages in a "normal" way and targetting them in code generation.

> guarantee that the writer of DSL2 worked hard to achieve

What does this guarantee worth if it can be broken by merely using this language?

I see that you're more on a bondage&discipline side of the PL research. In this case we have to agree to disagree, my decades of Lisp experience are forcing me to lean the other way. I'm all for the formal proofs and I'm doing them a lot when it is really necessary (e.g., in hardware, where the cost of error is huge and verification capabilities are limited), but besides that, having a hacking and permissive language allows experimentation, while b&d languages inhibit innovative designs.

I only built a b&d, non-Turing-complete (total functional), strictly typed version of my language workbench after years of experimenting on a dynamic, unrestricted codebase.

So, my claims are:

1) It's dead simple to implement eDSLs using metaprogramming, if you've got the right tools (features for dealing with ASTs and their transforms; examples: Nanopass framwork, Racket in general)

2) If you have a hierarchy of ready-made DSLs, then with the above approach you can easily mix arbitrarily selected properties of all of your existing languages into a new DSL.

3) PEG and GLR are great. The others are inferior. All hail the lexerless parsing!

And, btw., my framework is not any different from the other metaprogramming-based language workbenches. You can achieve this ease of development and robustness with any other framework.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: