You don't really need to know assembly to understand those, though. Usually clas...

haberman · on May 10, 2016

If all you tell me about the machine is its memory layout, you haven't told me nearly enough to explain the oddities of C.

For example, I could imagine a machine with identical memory layout to C, but that supported a hugely parallel, variable-size data bus where an operation like "x = y" (string assignment by value) could copy an entire string in a single operation.

The reason C doesn't support this is because generally at the assembly language layer you can only operate on memory in register-sized chunks, and every load or store of a register's worth of memory takes time. So assignment of a string by value requires a loop, just like it does in C.

Chathamization · on May 10, 2016

I'm not sure I understand your example. Knowing how arrays (or pointers) are handled in C is enough to understand why you can't have string assignment by value. How C treats things is all you really need to know if you want to use the language. This isn't just true for C. In my experience, people tend to have more difficulty with string assignment in Java than C, but you usually don't hear people say that they need to go to a lower level to really understand Java.

Understanding the why is interesting - like it is for any language. And like in many languages, the why tends to be complicated and somewhat arbitrary at times. If you're really interested in the why rather than the what, a book on the history of C will probably be more useful than learning assembly.

Unklejoe · on May 11, 2016

> [ Knowing how arrays (or pointers) are handled in C is enough to understand why you can't have string assignment by value.]

True, but I think knowing how arrays (and pointers to arrays) work in C is one of the main hurdles for a lot of people who are just starting out.

This became especially apparent to me after recently helping my friend get familiar with C.

I can see how some people can get confused by this when you consider the fact that structures can be copied using a straight assignment, while arrays can't.

People naturally try to find similarities when learning something new, so it took a little while for him to _really_ get it. I think his mind kept trying to think of a structure essentially as an array of variables, when that's not really the case.

friendzis · on May 11, 2016

It is not easy, but so far the best way to wrap newcomer around most of C oddities (in this context) is to explain two things: 1) memory location and size 2) run time vs compile time.

Then it becomes apparent why one cannot copy strings "by assignment" and can structs: it is in general impossible to know runtime size of string at compile-time. C strings have no structure known at compile time. Structures, on the other hand, are there to enforce structure on data.

There is a pretty neat real-world analogy here: copy machine. String copy must be done character-by-character in a same way that book or document folder would have to be copied page-by-page. On the other hand engineering drawing must be copied whole. It may contain references to other drawings, you can still with some struggle extract individual parts, but it is copied as a whole. This analogy relies on a fact that drawings are single-page and but nicely encapsulates the "strings are arrays are pointers" idea: folder may be empty, may be single page, but it is impossible to know without attempting.

jklowden · on May 11, 2016

    struct { char name[30]; } tgt, src = { "Einstein" };
    tgt = src;

Since there are no strings, it cannot be true that strings are pointers. They can be arrays, and array sizes are known at compile time. It's just a 40-year old cop-out that we can't copy by assignment all types based on their `sizeof` size.

friendzis · on May 12, 2016

> <...> it is in general impossible to know runtime size of string at compile-time <...>

You are stepping on the same rake as beginners: generalizing from a specific case instead of applying the general case to specific circumstances. I have stressed the word "in general". You may treat it like a cop-out or you can say that there are no special-case semantics here. Not sure if it was intentional, but your example is rather tricky. Until you step out of the box and see that we are no longer dealing with strings/arrays/pointers here, but structs, that have a bit different semantics.

> Since there are no strings

Yes, there is no explicit string type in a language, but somehow we do use strings in C. Semantics. We can semantically treat a particular block of memory as a string, time-series, binary tree, etc.. There is simply no special case (explicit language support) for strings.

> it cannot be true that strings are pointers. They can be arrays, and array sizes are known at compile time.

What about `malloc`? What about passing arrays between compilation units? I have covered this in SO[1]. Note that I never explicitly pass pointers, yet `sizeof()` thinks I do. Array sizes can, in some circumstances, be known at compile time in a specific program block, but not in general.

I'd say there are +/- 3 types of languages (core, no stdlib, etc) in this context: 1) provide common-special-case exceptions 2) wrap all cases in an easy-to-use interface 3) provide general case syntax. 1) languages with `=` and `eq` (Perl?) 2) languages with object-identity (Python?) 3) C

[1]: http://stackoverflow.com/questions/19589193/2d-array-and-fun...

pcwalton · on May 11, 2016

> For example, I could imagine a machine with identical memory layout to C, but that supported a hugely parallel, variable-size data bus where an operation like "x = y" (string assignment by value) could copy an entire string in a single operation.

You mean like x86 (at least as far as the ISA is concerned) [1]? :)

I mean, this isn't just me being pedantic and annoying: I think it goes to show that C's machine is quite different from a real machine.

[1]: http://x86.renejeschke.de/html/file_module_x86_id_279.html

nkurz · on May 11, 2016

I thought about bringing this up as well. I'm not sure what you mean by "C's machine is quite different from a real machine". Do you mean (as I was thinking) that even assembly language is frequently not close enough to the machine to be used to guide the programmer trying to write high efficiency code?

I'm constantly surprised by how poorly documented the actual operation of current processors is, and how few people seem to care. In one way, this means that the abstraction is working, and no longer does anyone need to look behind the curtain. In another way, like the move to teaching only higher level languages, it feels like something essential is being lost.

pcwalton · on May 11, 2016

What I mean is that C is defined in terms of a virtual machine with absolutely no restrictions on what happens if you step outside the boundaries of that machine's defined operations. That's in contrast to real machines, which typically have much less undefined behavior.

haberman · on May 11, 2016

I actually thought of that when I wrote my comment. But still, notice that (1) the instructions take as their input registers that point to the data (ie. a char pointer), not some machine-level idea of a "string", and (2) the cost of these instructions is still O(n), even though you don't have to write the loop manually.

robmccoll · on May 11, 2016

In a C compiler implemented for that architecture, copying a value of memory line size could very well use an instruction that does it all at once if the registers are large enough or a DMA intrinsic is exposed. Really that's the point. C is like a near-asm language that standardizes across ISAs with the opportunity for the compiler, libraries, or programmer to do something more clever on a specific system. It requires dedication and patience, but in the end is generally a good middle ground for low level work.

ryao · on May 11, 2016

Copying a string by doing x = y is rare among languages. Most copy references like C does. An example of an outlier that does a string copy like you want would be C++ on its string class.

You can copy/initialize strings in C without ever writing a single loop by using strcpy. As for hardware, you need not have something so exotic. x86 for instance has a string copy instruction. The C strcpy function is often compiled to it.

ejanus · on May 12, 2016

Is strcpy not implemented with a loop?

Unklejoe · on May 11, 2016

To add to the oddities:

You can copy entire structures by value by doing "x = y".

This ends up being implemented as a memory copy loop, but I'm not sure what happens with the padding bytes. There's a chance they get copied as well, but I really don't know.

In the code base I work with, it's common to see arrays inside of a structure which they're the only member of. This makes it a little easier to copy them, though I'm not sure if that was the intended purpose.

Something like this would be defined in a header...

typedef struct { char myArray[50]; } Test_Struct_T;

caf · on May 11, 2016

It's not specified what happens to the memory padding bytes on structure assignment.

Hence: you can't use memcmp() to reliably compare structures for equality.

mpweiher · on May 11, 2016

> You can copy entire structures by value by doing "x = y".

That wasn't in the original language, though it's a pretty old extension.

gerbilly · on May 11, 2016

> That wasn't in the original language, though it's a pretty old extension.

That's why I don't recognize that feature. I guess we didn't have it back when I was doing a lot of C.

eddyb · on May 11, 2016

I believe C++14 adds the first sane C-family array type:

template<typename T, size_t N> class array { T data[N]; };

Kristine1975 · on May 11, 2016

std::array is in C++11