DotGNU Portable.Net FAQ

Rhys Weatherley, rweather@southern-storm.com.au.
Last Modified: $Date: 2001/08/14 02:04:24 $

Copyright © 2001 Southern Storm Software, Pty Ltd.
Permission to distribute unmodified copies of this work is hereby granted.

1. What is DotGNU Portable.NET?

DotGNU Portable.NET is a project under the DotGNU meta-project. Its goal is to build a suite of free software tools to build and execute Common Language Infrastructure (CLI) applications. The initial target platform is GNU/Linux, with other platforms to follow in the future.

DotGNU Portable.NET is built in accordance with the requirements of the GNU Project and FreeDevelopers.

DotGNU Portable.NET is focused on compatibility with the ECMA specifications for CLI. There are other projects under the DotGNU meta-project to build other necessary pieces of infrastructure, and to explore non-CLI approaches to virtual machine implementation.

2. Why not write the compiler tools in C#?

The main reason is the "chicken and egg" problem. We wouldn't be able to run the compiler until the runtime engine and the full C# system library is written, and they are still a work in progress.

While it is possible to bootstrap off Microsoft's engine and compiler, there is an open legal question in doing this. We want to avoid any "booby traps" that may exist in Microsoft licenses that prevent the free development of DotGNU Portable.NET. It is safer to avoid dependence upon Microsoft tools.

Writing the C# compiler in C means we are bootstrapping from gcc, and not Microsoft's compiler, which should avoid any legal problems.

The second reason for writing the compiler in C is security. Independent third parties can inspect the C# compiler source for security problems, and then compile the code with their (hopefully) trusted version of gcc to get a trusted C# compiler.

Writing the compiler in C# would introduce a tough trust problem: you must trust that the bootstrapped binary version of the compiler does not have any back doors. Inspecting the source code is not sufficient to perform a full security audit.

3. If the compiler was written in C#, wouldn't reuse be easier?

Theoretically, yes. Reuse is the stated reason for why the Mono project is writing all of their tools in C#.

Should the Mono project succeed at this goal, then their components should be directly reusable by anyone running DotGNU Portable.NET.

However, nothing in DotGNU Portable.NET's cscc compiler would prevent its reuse in other compilers. As long as those compilers are themselves written in C.

Cscc is architected so that new languages can be easily added as plug-ins. The plug-in converts source code into IL assembly code, which cscc processes to produce the final executable. Plug-ins can either do this conversion their own way, or reuse the "codegen" facilities to do most of the hard work for them.

4. I've heard that you can compile C# to the JVM. Is that correct?

Yes. The cscc compiler is architected so that it can compile to either IL or JVM bytecode. Adding other bytecode formats would be quite easy.

5. What is treecc?

Treecc is a tool that we wrote to assist in the development of cscc. It complements flex and bison by providing support for abstract syntax tree creation and manipulation.

Treecc performs most of the housekeeping within the core of the compiler, allowing the programmer to concentrate on the specifics of language implementation. A fuller account of how treecc works can be found at its Web site, http://www.southern-storm.com.au/treecc.html.

6. Using gcc as a compiler

A common question that arises is why we aren't using gcc to compile C# code to IL. Strategically, we would like to be able to reuse all of the good work that has gone into gcc. The DotGNU Project currently has an open request for someone to volunteer to modify gcc to generate IL bytecode.

However it isn't quite as easy as it looks. The following sections provide a blow by blow account of why this is so hard.

6.1. Why don't you add C# to the list of languages gcc supports?

Because it won't solve the problem that we need to solve.

Initially we need a C# compiler that can generate IL bytecode for the .NET platform. Later, we may need a C# compiler that can generate native code as well, but that is optional.

Putting a C# parser on the front of gcc would give us a native compiler, but it won't give us an IL bytecode compiler.

6.2. So what? Add an IL bytecode backend to gcc, and you'll solve your problem, and also be able to compile C, C++, Fortran, etc, to .NET.

This is not as easy as it looks. Gcc is divided into a number of phases: parsing, semantic analysis, tree-to-RTL conversion, RTL handling (including optimization), and final native code generation.

The hard part is RTL (Register Transfer Language). This part of gcc is hard-wired to generate code for register-based CPU's such as i386, PPC, Sparc, etc. RTL is not designed for generating code for stack-based abstract machines such as IL.

Also, RTL loses a lot of the type and code structure information that IL needs in the final output file. By the time RTL gets the code, information about whether a value is an integer or an object reference is mostly lost. Information about the class structure of the code is lost. This information is critical for correct compilation of C# to IL.

6.3. But hang on a second! Gcj, the Java back-end for gcc, does stack machines! Why not do something like that?

Err ... no it doesn't. The Java bytecode stuff in gcj is not organised as an RTL back-end.

When gcj compiles Java, it performs parsing and semantic analysis in the front-end, like the other supported languages. Then the parse tree is sent in one of two different directions.

If gcj is compiling to native, the parse tree is handed to the RTL core of the compiler, and it takes over.

If gcj is compiling to bytecode, the parse tree is handed to a completely separate code generator that knows about Java bytecode.

Because gcj does NOT implement a bytecode RTL back-end for gcc, it cannot compile C, C++, etc down to bytecode. Java bytecode is a special case that only works for the Java front-end.

6.4. But what about egcs-jvm? Doesn't it compile C to Java bytecode?

It's a hack. The code that it generates is horrible, and does not conform to the usual conventions that the JVM requires. If one compiled Java code using this back-end, it wouldn't work with normal Java code due to the differences in calling conventions and what-not.

The biggest problem that the author of egcs-jvm he had was trying to work around the register machine assumptions in the code. The result wasn't pretty. He has said that it would be easier to throw the JVM away and invent a register-based abstract machine than try to make gcc generate efficient stack machine code.

6.5. Isn't there a gcc port to the Transputer, which is stack-based?

Yes there is, for an older version of gcc (2.7.2). The source can be found here.

It appears to compile the code to a pseudo-register machine, and then fixes up the code to be stack based afterwards. It takes advantage of some register stack features in gcc that egcs-jvm didn't use.

The Transputer is still a traditional CPU despite being stack-based. The gcc port puts pointer values into integer pseudo-registers, which would violate the security requirements of IL.

The i386 gcc port uses a regular set of registers for integer/pointer values, and a register stack for floating point values. The Transputer port uses two register stacks: one for integer/pointer values, and the other for floating point values. It may be possible to use three register stacks for IL: one for integer values, another for pointer values, and a third for floating point values.

However, this still may not give a useful result. This fixes the security problems for the pseudo-registers, but it doesn't fix the security problems for memory. RTL assumes that main memory is a flat, untyped, address space, where any kind of value can be stored in any word. Partitioning main memory into separate types may not be possible without a rewrite of RTL.

6.6. OK, so do something similar to gcj for C#. Use two code generators. That would work right?

Yes it would, except for one small catch.

Because there are so many people who don't understand how gcc works, they will assume that they can compile C and C++ to IL bytecode after we release the C# patches.

Then they will discover that this isn't the case and will get extremely angry that we didn't build what they thought we were building. *sigh*

Now matter how we attack the problem, we will end up having to write an IL bytecode backend for RTL, which is extremely difficult because of the various assumptions in the code.

Realistically, someone with a great deal of gcc knowledge needs to go into the gcc core, rip RTL completely out, throw it away, and replace it with something that knows about both register machines and stack machines.

Alternatively, someone could create a STL (Stack Transfer Language), that passes all languages through a separate code generator that knows about stack machines. Then we can write STL back-ends for IL and JVM bytecode. Both gcj and DotGNU would benefit from this.

6.7. We're not buying it. It's not as hard as you think.

Fine. Prove us wrong. Download the gcc sources and have at it. The Transputer port may be a good place to start to get ideas, or it may not.

7. Other .NET Efforts

7.1. Mono

The Mono project that is run by Ximian has many of the same goals as DotGNU Portable.NET. See their Web site for further details.

7.2. Why not co-operate with Mono?

We have suggested dividing up the work to prevent duplication, but Mono seems determined to do things their own way.

Therefore, we will treat Mono like any other GPL-using project: if they have something useful that we can use, we will use it. Otherwise we will continue developing DotGNU Portable.NET as we see fit.

We are interested in co-operating with anyone interested in co-operating with us.

7.3. What other free software and open source .NET efforts are there?

We are not aware of any other projects that are tackling the entire .NET platform at present, but there are some that are tackling tools such as decompilers, IDE's, etc. Mono's FAQ contains an up to date list.

8. How can I help?

The biggest area that needs to be tackled is the C# library, pnetlib. Pick a class, any class, implement it, and send us the changes.

We could also use some assistance with documentation of the API's within the current code base. Mostly this involves converting the contents of the ".h" files in the "include" directory into Texinfo-compatible documentation and examples.

Other than that, if you find an interesting problem to work on in the DotGNU Portable.NET codebase, then work on it for a bit, and send us the patches. If it is the kind of code we're looking for, we'll discuss further collaboration.

9. CVS, versions, patches, etc

9.1. How do I access the source via CVS?

All of the DotGNU Portable.NET code is available via CVS from Savannah, http://savannah.gnu.org/. See that site for details on accessing the repository via anonymous CVS, or via the Web interface.

The repository name for DotGNU Portable.NET is "dotgnu-pnet", and it contains three modules: "pnet", "pnetlib", and "treecc".

9.2. What is with the version numbers?

Versions 0.1.2 and prior used a version numbering scheme that Rhys Weatherley concocted out of thin air. After the move to the Savannah CVS repository, the following conventions were adopted:

The working CVS version will always end in an odd number, and the released version will always end in an even number. For example, "0.1.3" is the working version that will lead up to the "0.1.4" release version.

When a major component is judged to be complete, the version numbers will advance at the next-higher level. For example, the numbers jumped from "0.0.6" to "0.1.0" when the disassembler was completed. The primary maintainer, Rhys Weatherley, decides when a major version jump is to occur.

This convention will be adopted across all DotGNU Portable.NET components: "pnet", "pnetlib", and "treecc".

Starting from version "0.1.4" of DotGNU Portable.NET, tags will be added to the CVS tree whenever a release version is cut. The tag for version "0.1.4" will be "r_0_1_4". Working versions will never have a tag.

9.3. I have a patch. What should I do now?

The best way to submit the patch is through the patch manager on Savannah. That will allows us to track it.

Please include comments with your patch that explain what it is for. Also include your full e-mail address, and any information related to Copyright assignment (see below).

The maintainers will decide on a case by case basis whether to accept a patch. Submitting it does not guarantee inclusion.

9.4. Who owns the Copyright on patches?

The DotGNU Project is working on guidelines for explicit Copyright assignment. When they have been finalised, they will replace the guidelines below.

For small patches, Copyright will automatically revert to the primary maintainer for the files or directories being patched. If you don't want this to happen, then don't submit the patch.

For larger patches, you should explicitly assign the Copyright to the primary maintainer, or to FreeDevelopers. We prefer that you assign the Copyright to the maintainer of the files you are patching, to prevent dilution of the Copyright on those files. Should problems arise in the future, it is easier to replace an entire file than edit the contents of a single file.

To assign the Copyright, include a notice in the patch comments as to how you want the Copyright assigned. If you don't include such a comment, we will need to contact you via e-mail to get your permission.

The GNU Project has strict guidelines about Copyright assignment. The goal is to have a predictable Copyright on each GNU package, should legal action ever need to be taken to defend the GPL. If you don't agree to these guidelines, then don't submit the patch.

9.5. Coding conventions

The DotGNU Portable.NET code currently using the following coding conventions: If you are submitting patches to an existing file, then use the same conventions as currently exist in that file. If you are writing a completely new source file, then you may use your own coding conventions, but we would prefer consistency with the above.

10. Why isn't the C# library LGPL?

The license on the C# library, "pnetlib", is distributed under a modified GPL license:
The source code for the library is distributed under the terms of the GNU General Public License, with the following exception: if you link this library against your own program, then you do not need to release the source code for that program. However, any changes that you make to the library itself, or to any native methods upon which the library relies, must be re-distributed in accordance with the terms of the GPL.
We call this the "GPL plus linking exception", which is also used by the GNU Classpath project.

An obvious licensing question that many people have is why use the GPL instead of the LGPL for this library? We aren't trying to restrict its use by commercial entities.

However, there is a small catch with the LGPL and native methods. A commercial entity could produce their own proprietry runtime engine that has "enhanced" native method support of some kind. Under the terms of the LGPL, they would be obligated to release the declaration of the native method in the C# system library. For example:

extern int enhanced_method(string arg1, int arg2);
But would they be obligated to release the source code to the native method's implementation under the terms of the LGPL? Because it is in a separate program (their runtime engine), it isn't strictly part of the library. The result would be a C# library that is useless without their proprietry native method implementation. This state of affairs is undesirable.

Under the terms of the GPL, we can require that the source code to native methods must also be available, or the library modification is disallowed.

This is why we have decided to use the GPL with the linking exception described above.

[Aside: by "native method" we mean any method that is implemented in something other than IL bytecode. This includes PInvoke functions and "internalcall" methods, among others.]