It makes me wonder, what is next? What new astonishing thing will happen in version control?
I think what's needed is an intelligent (as in AI) merge mechanism. Right now, if two people are adding two different features to a set of files, then merging those changes is error-prone and requires a lot of manual work.
If this ever gets perfected and automated, it will be a huge milestone.
This should be much simpler once we stop using those silly text files and start storing everything as a proper representation of AST. Then again, at that point we can get rid of the silly text-diff-based systems and just store everything as versioned trees.
I actually wrote a very simple system like this for a hackathon several months ago. The idea is that we would take some basic Scheme code (boy did we aim high :), parse it and commit the result. We would then diff the trees and keep track of the changes that way. Finally we had a cute web front-end that pretty printed the code from the AST and could show the diffs visually.
We got the basics working, including simple diffs. One goal was to link the same variables between two versions; we did not manage to make that work, but had a very hacky approach that looked like it worked.
Doing any sort of merging with this data is nontrivial. We were planning to implement it, but unfortunately ran out of time. Still, we did have a cute demo of some commits and some diffs in the end--it actually worked a little, which is much more than I expected starting out.
However, despite not implementing merging, we did throw in some nice features. Particularly, we were able to identify commits that did not change the function of the code (whitespace and comment changes only) and mark them. This was very easy but yet still useful, and a good indicator of the sorts of things one could do with a system like that.
After the hackathon, one of my friends found some papers about a system just like ours. I don't remember where they were from, but if you're interested you could look for them. (I think the phrase "semantic version control" is good for Googling; that's what we called our project.)
Overall I think that it's a neat domain but in hindsight maybe it was a little too much for 18 hours of coding :) We did have fun, and it was cool, so I have no regrets.
I think it's most accurate to say that code is a textual representation of an AST. Saying that it's just text is just like saying it's just a bunch of numbers--both technically true but missing the bigger picture.
One potential reason no to store code as text is that there are many equivalent programs that differ only in inconsequential text. A perfect example is trailing whitespace.
There are also some benefits of storing code as an AST. For one, it would make it trivial to identify commits that did not change the actual code--things like updated comments. This would help you filter out commits when looking for bugs. Another benefit would be better organized historical data: in a perfect system, you would be able to look at the progress of a function even if it got renamed part of the way through.
But then you end up with a version control system that is not generic, but dependent on a particular language. The story of Smalltalk suggests that the added value might not be worth the coupling and complexity it requires.
You should be able to write a generic version control system like this where you can just plug the appropriate parser in and it would work for that language. For backup, you could have it still keep some files as text.
Because you can always convert from ast to text, but not always from text to ast. Also it's easier and faster to convert ast->text, and you can do it when needed only. Additionally you'd never commit a syntax error. Why is it a bad idea?
I think what's needed is an intelligent (as in AI) merge mechanism. Right now, if two people are adding two different features to a set of files, then merging those changes is error-prone and requires a lot of manual work.
If this ever gets perfected and automated, it will be a huge milestone.