So I’ve got this great programming language all worked out.  It’s called Deca, and it’s the systems-programming language I’ve always wanted.  It has bloody everything I like: a single-inheritance object system based on generic functions and multimethods, tuples-as-structures-as-arrays-as-objects, lexical scoping with Scheme-style LET statements, exceptions, a parametric module system for generic programming, local type inference with static typing, even an absolutely-most-definitely-guaranteed-safe alias/reference type that gives me processor-level mutexes for free and enables me to compile the language down to machine code with no runtime library or garbage collection.  Oh, and there’s an “unsafe” dialect for writing modules that need pointers and inline assembly.  Sounds fricking awesome, right?  Why, I should probably go ahead and implement the compiler ASAP so I can claim my world fame, right?  Actually, I still need to design a complete syntax for all this, but I could still write it in S-expressions until I work out a better syntax.

Well no.  Because I want to add first-class functions to this language.  Therein lies the rub, because any function passed around as a first-class value could, and often does, have a closure environment of some sort with it.  However, the signature of the function, which would normally be its whole and entire type in a functional language (Function of x, y, and z returning an a and carrying a closure pointer that may be null…), cannot possibly be its whole type in Deca, because Deca has to run without (language-supplied) garbage collection and therefore the size of every single data object passed into or out of a function must be known in such a way that the compiler can theoretically (that is, before optimizations like passing arguments in registers) allocate that object on the stack.  As I’ll explain below, this is a problem.

So far I’ve been using the idea from D (as in Digital Mars’ D) for compiling closure expressions: create an anonymous class to contain the variables the compiler infers as closed-over by the function definition, with a “call” or “operator()” method that (other than its closure-object argument, which the compiler automatically passes like a “this” pointer despite Deca being a multiple-dispatch language) has an identical signature to the function as the programmer defined it.  Calling an instance of such a class (ie: “f()” where f is of a function type) calls the correct (as determined by runtime dispatch) “operator()” method with the given arguments and the implicit instance of the closure-object.  To make all this more than mere syntactic sugar for anonymous classes, all “closure classes” are direct subtypes (meaning no closure class is a child of another closure class) of an abstract supertype (with no closed-over variables at all) that simply represents an old-fashioned, C-style function pointer with no closures involved.

By now you probably see the problem yourself.  If I have to define the type of an argument to or return value from a function and I define it as a function type, how much space should the compiler allocate on the stack to pass or return such an “object”?

For arguments I can simply use pass-by-reference, of course.  The compiler allocates space for a pointer, does its type-checking and uses old-fashioned RTTI to determine what the function actually got at runtime.  But I anticipate that programmers (not in the least myself) will actually want to pass a function by value at some point.  Actually, even old C has variadic functions that determine at compile-time how much memory is needed on the stack for their arguments, but I’m sketching this part out to show how much harder the hard case is in contrast.

What about return values of function types?  Or more generally, what happens when I declare a function as returning an instance of a certain class and then attempt to actually return an instance of one of its subclasses?  How do I avoid resorting to pointer semantics here?  C++ seems to handle it somehow (see the attached C++ source file to test the language’s behavior), so I guess I should start by disassembling my C++ binary.  If anyone has an idea that doesn’t involve disassembling a binary file originally written in the Shub-Niggurath of programming languages (I didn’t wear my Magen David when I did it, so my soul may or may not remain in my body), please feel free to leave it in a comment so I can steal it!

Happy hacking to all!  Here’s the source of the program I wrote to see how C++ handles this stuff.  It compiles under Linux with g++, and uses weird “flds/z/cw”, “fucompp”, “fnstsw”, “fistpl” and “sahf” x86 assembly instructions that I’ve never seen before.  TO THE INTEL BIBLE!

class Parent {
public:
virtual int foo(int bar) {
return bar + 1;
}
};

class Child: public Parent {
int quux;
public:
int foo(int bar) {
return bar + quux;
}
Child(int i) {
quux = i;
}
};

Parent baz(float x) {
if(x < 0.0)
return Child((int)x);
else
return Parent();
}

int main(int argc,char** argv) {
return 0;
}

Yeah, gorram Wordpress doesn’t let me attach .cpp files.  They don’t meet its security guidelines ;-).

EDIT: It turns out those funny instructions are floating-point instructions.  Made x an integer now to get rid of them.  The code still looks weird, but interpretable.

AGAIN EDIT: It appears the language feature I’m looking for is called return-type covariance.  So yeah, I need some way to return values of covariant type to my declared return type on the stack.