Copyright © 2001 Southern Storm Software, Pty Ltd.
Permission to distribute unmodified copies of this work is hereby granted.
DotGNU Portable.NET is built in accordance with the requirements of the GNU Project and FreeDevelopers.
DotGNU Portable.NET is focused on compatibility with the ECMA specifications for CLI. There are other projects under the DotGNU meta-project to build other necessary pieces of infrastructure, and to explore non-CLI approaches to virtual machine implementation.
Note: It isn't possible to compiler pnetlib with DotGNU Portable.NET's
C# compiler yet. The compiler can perform a syntax check, but not
a full compile. A pre-compiled version of the library is distributed
with "pnet" as "
Treecc performs most of the housekeeping within the core of the compiler,
allowing the programmer to concentrate on the specifics of language
implementation. A fuller account of how treecc works can be found
at its Web site, http://www.southern-storm.com.au/treecc.html.
Other tools, such as Antlr, do have similar functionality, but we
found it more convenient to write our own tool. We needed something
that worked with C and which could perform a large amount of error-checking
on the abstract syntax tree definitions. No other tool provided the right
combination.
The purpose of this tool is not to compare DotGNU Portable.NET with
other systems. Rather, it is intended to identify areas of DotGNU
Portable.NET that may need further attention.
The
You may be tempted to run PNetMark against the Microsoft CLR. If you
do, you cannot tell the author of the benchmark, or anyone else for
that matter, what the results are. The following is an excerpt from
Microsoft's End User License Agreement (EULA) for their .NET Framework SDK:
While it is possible to bootstrap off Microsoft's engine and
compiler, there is an open legal question in doing this. We want
to avoid any "booby traps" that may exist in Microsoft licenses
that prevent the free development of DotGNU Portable.NET. It is
safer to avoid dependence upon Microsoft tools.
Writing the C# compiler in C means we are bootstrapping from gcc,
and not Microsoft's compiler, which should avoid any legal problems.
The second reason for writing the compiler in C is security.
Independent third parties can inspect the C# compiler source
for security problems, and then compile the code with their
(hopefully) trusted version of gcc to get a trusted C# compiler.
Writing the compiler in C# would introduce a tough trust problem:
you must trust that the bootstrapped binary version of the compiler
does not have any back doors. Inspecting the source code is not
sufficient to perform a full security audit.
Should the Mono project succeed at this goal, then their components
should be directly reusable by anyone running DotGNU Portable.NET.
However, nothing in DotGNU Portable.NET's cscc compiler would prevent
its reuse in other compilers. As long as those compilers are themselves
written in C.
Cscc is architected so that new languages can be easily added as
plug-ins. The plug-in converts source code into IL assembly code,
which cscc processes to produce the final executable. Plug-ins
can either do this conversion their own way, or reuse the "codegen"
facilities to do most of the hard work for them.
However it isn't quite as easy as it looks. The following script of
a hypothetical discussion provides a blow by blow account of why this
is so hard. This script is based in part on e-mails we have exchanged
with users in the past.
Why don't you add C# to the list of languages gcc supports?
Because it won't solve the problem that we need to solve.
Initially we need a C# compiler that can generate IL bytecode for
the .NET platform. Later, we may need a C# compiler that can
generate native code as well, but that is optional.
Putting a C# parser on the front of gcc would give us a native
compiler, but it won't give us an IL bytecode compiler.
So what? Add an IL bytecode backend to gcc, and you'll solve your
problem, and also be able to compile C, C++, Fortran, etc, to .NET.
This is not as easy as it looks. Gcc is divided into a number of
phases: parsing, semantic analysis, tree-to-RTL conversion, RTL
handling (including optimization), and final native code generation.
The hard part is RTL (Register Transfer Language). This part of
gcc is hard-wired to generate code for register-based CPU's such
as i386, PPC, Sparc, etc. RTL is not designed for generating code
for stack-based abstract machines such as IL.
Also, RTL loses a lot of the type and code structure information
that IL needs in the final output file. By the time RTL gets the
code, information about whether a value is an integer or an object
reference is mostly lost. Information about the class structure
of the code is lost. This information is critical for correct
compilation of C# to IL.
But hang on a second! Gcj, the Java back-end for gcc, does stack
machines! Why not do something like that?
Err ... no it doesn't. The Java bytecode stuff in gcj is not
organised as an RTL back-end.
When gcj compiles Java, it performs parsing and semantic analysis
in the front-end, like the other supported languages. Then the
parse tree is sent in one of two different directions.
If gcj is compiling to native, the parse tree is handed to the RTL
core of the compiler, and it takes over.
If gcj is compiling to bytecode, the parse tree is handed to a
completely separate code generator that knows about Java bytecode.
Because gcj does NOT implement a bytecode RTL back-end for gcc, it
cannot compile C, C++, etc down to bytecode. Java bytecode is a
special case that only works for the Java front-end.
But what about egcs-jvm? Doesn't it compile C to
Java bytecode?
It's a hack. The code that it generates is horrible, and does not
conform to the usual conventions that the JVM requires. If one
compiled Java code using this back-end, it wouldn't work with
normal Java code due to the differences in calling conventions
and what-not.
The biggest problem that the author of egcs-jvm he had was
trying to work around the register machine assumptions in the code.
The result wasn't pretty. He has said that it would be easier to
throw the JVM away and invent a register-based abstract machine
than try to make gcc generate efficient stack machine code.
Isn't there a gcc port to the Transputer, which is stack-based?
Yes there is, for an older version of gcc (2.7.2). The source can
be found here.
It appears to compile the code to a pseudo-register machine, and then
fixes up the code to be stack based afterwards. It takes advantage of
some register stack features in gcc that egcs-jvm didn't use.
The Transputer is still a traditional CPU despite being stack-based.
The gcc port puts pointer values into integer pseudo-registers, which would
violate the security requirements of IL.
The i386 gcc port uses a regular set of registers for integer/pointer
values, and a register stack for floating point values. The Transputer
port uses two register stacks: one for integer/pointer values, and
the other for floating point values. It may be possible to use three
register stacks for IL: one for integer values, another for pointer values,
and a third for floating point values.
However, this still may not give a useful result. This fixes the security
problems for the pseudo-registers, but it doesn't fix the security problems
for memory. RTL assumes that main memory is a flat, untyped, address space,
where any kind of value can be stored in any word. Partitioning main memory
into separate types may not be possible without a rewrite of RTL.
OK, so do something similar to gcj for C#. Use two code generators.
That would work right?
Yes it would, except for one small catch.
Because there are so many people who don't understand how gcc works,
they will assume that they can compile C and C++ to IL bytecode after
we release the C# patches.
Then they will discover that this isn't the case and will get
extremely angry that we didn't build what they thought we were
building. *sigh*
Now matter how we attack the problem, we will end up having to
write an IL bytecode backend for RTL, which is extremely difficult
because of the various assumptions in the code.
Realistically, someone with a great deal of gcc knowledge needs to
go into the gcc core, rip RTL completely out, throw it away, and
replace it with something that knows about both register machines
and stack machines.
Alternatively, someone could create a STL (Stack Transfer Language),
that passes all languages through a separate code generator that
knows about stack machines. Then we can write STL back-ends for
IL and JVM bytecode. Both gcj and DotGNU would benefit from this.
We're not buying it. It's not as hard as you think.
Fine. Prove us wrong. Download the gcc sources and have at it.
The Transputer port may be a good place to start to get ideas,
or it may not.
An obvious licensing question that many people have is why use the GPL
instead of the LGPL for this library? We aren't trying to restrict its
use by commercial entities.
However, there is a small catch with the LGPL and native methods.
A commercial entity could produce their own proprietry runtime engine
that has "enhanced" native method support of some kind. Under the
terms of the LGPL, they would be obligated to release the declaration
of the native method in the C# system library. For example:
Under the terms of the GPL, we can require that the source code to
native methods must also be available, or the library modification
is disallowed.
This is why we have decided to use the GPL with the linking exception
described above.
[Aside: by "native method" we mean any method that is implemented in
something other than IL bytecode. This includes PInvoke functions and
"internalcall" methods, among others.]
For small patches, Copyright will automatically revert to the primary
maintainer for the files or directories being patched. If you don't
want this to happen, then don't submit the patch.
For larger patches, you should explicitly assign the Copyright
to the primary maintainer, or to FreeDevelopers. We prefer that
you assign the Copyright to the maintainer of the files you are
patching, to prevent dilution of the Copyright on those files.
Should problems arise in the future, it is easier to replace
an entire file than edit the contents of a single file.
To assign the Copyright, include a notice in the patch comments as to
how you want the Copyright assigned. If you don't include such a
comment, we will need to contact you via e-mail to get your
permission.
The GNU Project has strict
guidelines about Copyright assignment. The goal is to have
a predictable Copyright on each GNU package, should legal action
ever need to be taken to defend the GPL. If you don't agree
to these guidelines, then don't submit the patch.
We could also use some assistance with documentation of the API's
within the current code base. Mostly this involves converting
the contents of the ".h" files in the "include" directory into
Texinfo-compatible documentation and examples.
Other than that, if you find an interesting problem to work on
in the DotGNU Portable.NET codebase, then work on it for a bit, and
send us the patches. If it is the kind of code we're looking for,
we'll discuss further collaboration.
The repository name for DotGNU Portable.NET is "
The working CVS version will always end in an odd number, and the
released version will always end in an even number. For example,
"0.1.3" is the working version that will lead up to the "0.1.4"
release version.
When a major component is judged to be complete, the version numbers
will advance at the next-higher level. For example, the numbers
jumped from "0.0.6" to "0.1.0" when the disassembler was completed.
The primary maintainer, Rhys Weatherley, decides when a major
version jump is to occur.
This convention will be adopted across all DotGNU Portable.NET components:
"
Starting from version "0.1.4" of DotGNU Portable.NET, tags will be added
to the CVS tree whenever a release version is cut. The tag for version
"0.1.4" will be "r_0_1_4". Working versions will never have a tag.
Please include comments with your patch that explain what it is
for. Also include your full e-mail address, and any information
related to Copyright assignment (see "Who owns the Copyright on patches?").
The maintainers will decide on a case by case basis whether to
accept a patch. Submitting it does not guarantee inclusion.
Microsoft's .NET Framework SDK contains a lot more classes in its
base class libraries. Because we wish to be (more or less) compatible
with Microsoft's .NET offerings, we have to implement more than ECMA
specifies.
We generally follow the ECMA specifications to the letter, and only
deviate from them where they are missing information, or the information
conflicts with Microsoft's actual implementation.
More recently, we have been discussing common standards and design
methodologies with Mono to ensure that our two systems successfully
interoperate. This is more at the design level than at the code
level, but it is still useful.
We may be able to use some of Mono's upper-level C# libraries,
and hence co-operate more on that basis. For various technical
reasons, Mono's lower-level C# library "corlib" will not work
with DotGNU Portable.NET's runtime engine, and so we still need
"pnetlib".
But generally, we will treat Mono like any other GPL-using project:
if they have something useful that we can use, we will use it.
Otherwise we will continue developing DotGNU Portable.NET
as we see fit.
1.2. What is pnet?
The bulk of DotGNU Portable.NET is made up of the runtime engine,
the C# compiler, and a host of useful development tools. This
package is generally referred to as "pnet".1.2. What is pnetlib?
The C# system library was split off from the main source distribution
during the early phases of development. The main reason for this was
to enable other free software .NET efforts to reuse the code.samples/mscorlib.dll
". If you wish to
modify the library, you will need to use Microsoft's C# compiler.1.3. What is treecc?
Treecc is a tool that we wrote to assist in the development of cscc.
It complements flex and bison by providing support for abstract syntax
tree creation and manipulation.1.4. What is PNetMark?
PNetMark is a benchmarking tool for Common Language Runtime (CLR)
environments. It is loosely based on the techniques used by the
CaffeineMark to benchmark Java.README
file within the PNetMark distribution contains
additional information on running the benchmark. It also contains
a description as to why you should never believe what benchmarks
tell you, especially when comparing different systems.
6. Performance or Benchmark Testing. You may not disclose the
results of any benchmark test of either the Server Software or
Client Software to any third party without Microsoft's prior
written approval.
Thus, you can run the benchmark if you like, but you must keep the
results to yourself. If you don't like this, then you will have to
take it up with Microsoft's lawyers.2. C# compiler questions
2.1. Why not write the compiler tools in C#?
The main reason is the "chicken and egg" problem. We wouldn't be
able to run the compiler until the runtime engine and the
full C# system library is written, and they are still a work
in progress.2.2. If the compiler was written in C#, wouldn't reuse be easier?
Theoretically, yes. Reuse is the stated reason for why the
Mono project is writing all of
their tools in C#.2.3. I've heard that you can compile C# to the JVM. Is that correct?
Yes. The cscc compiler is architected so that it can compile to either
IL or JVM bytecode. Adding other bytecode formats would be quite easy.2.4. Why don't you use gcc as the basis for your C# compiler?
A common question that arises is why we aren't using gcc to compile
C# code to IL. Strategically, we would like to be able to reuse all
of the good work that has gone into gcc. The DotGNU Project currently
has an open request for someone to volunteer to modify gcc to generate
IL bytecode.3. Copyright issues
3.1. Why isn't the C# library LGPL?
The license on the C# library, "pnetlib
", is distributed
under a modified GPL license:
The source code for the library is distributed under the terms of the
GNU General Public License, with the following exception: if you link
this library against your own program, then you do not need to release
the source code for that program. However, any changes that you make
to the library itself, or to any native methods upon which the library
relies, must be re-distributed in accordance with the terms of the GPL.
We call this the "GPL plus linking exception", which is also used by
the GNU Classpath project.
But would they be obligated to release the source code to the native
method's implementation under the terms of the LGPL? Because it is in
a separate program (their runtime engine), it isn't strictly part of
the library. The result would be a C# library that is useless without
their proprietry native method implementation. This state of affairs
is undesirable.extern int enhanced_method(string arg1, int arg2);
3.2. Who owns the Copyright on patches?
The DotGNU Project is working
on guidelines for explicit Copyright assignment. When they have
been finalised, they will replace the guidelines below.4. How can I help?
The biggest area that needs to be tackled is the C# library,
pnetlib.
Pick a class, any class, implement it, and send us the changes.
See the question on "Standards" for information on obtaining
the ECMA class library documentation.5. CVS, versions, patches, etc
5.1. How do I access the source via CVS?
All of the DotGNU Portable.NET code is available via CVS from
Savannah, http://savannah.gnu.org/.
The main project Web page is at
http://savannah.gnu.org/projects/dotgnu-pnet/, and the CVS instructions are at
http://savannah.gnu.org/cvs/?group_id=353.dotgnu-pnet
",
and it contains three modules: "pnet
", "pnetlib
",
and "treecc
".5.2. What is with the version numbers?
Versions 0.1.2 and prior used a version numbering scheme that Rhys
Weatherley concocted out of thin air. After the move to the Savannah
CVS repository, the following conventions were adopted:pnet
", "pnetlib
", and "treecc
".5.3. I have a patch. What should I do now?
The best way to submit the patch is through the patch manager on
Savannah. That will allow us to track it.5.4. Coding conventions
The DotGNU Portable.NET code currently using the following
coding conventions:
If you are submitting patches to an existing file, then use the
same conventions as currently exist in that file. If you are
writing a completely new source file, then you may use your own
coding conventions, but we would prefer consistency with the above.if(condition)
{
...
}6. Standards
6.1. Where are the ECMA standards?
The latest versions of the ECMA standards for the Common Language
Infrastructure (CLI) and the C# languages can be found at Microsoft's
MSDN Web site:
http://msdn.microsoft.com/net/ecma/
If you wish to contribute to the C# library, you will need the
following file:
http://msdn.microsoft.com/net/ecma/All.xml
Use the "csdoc2html
" program to convert this XML file into HTML,
so that you can view its contents more easily.6.2. Why do you have more classes than ECMA specifies?
ECMA specifies the bare minimum necessary to get a Common Language
Runtime (CLR) to work. However, this bare minimum is not very useful
for realistic C# applications.7. Other .NET Efforts
7.1. Mono
The Mono project that is
run by Ximian has many of
the same goals as DotGNU Portable.NET. See their Web site
for further details.7.2. Why not co-operate with Mono?
We have suggested dividing up the work to prevent duplication,
but Mono seems determined to do things their own way.7.3. OCL
Intel have written a C# class library, which they call the Open CLI
Library (OCL). A unique feature of this library is that its interfaces
have been automatically generated from the ECMA specifications, whereas
DotGNU Portable.NET and Mono have copied the interfaces by hand.
The library can be obtained at the following site:
http://ocl.sourceforge.net/
7.4. What other free software and open source .NET efforts are there?
We are not aware of any other projects that are tackling the
entire .NET platform at present, but there are some that are tackling
tools such as decompilers, IDE's, etc. Mono's FAQ contains an up
to date list.
Copyright © 2001 Southern Storm Software, Pty Ltd.
Permission to distribute unmodified copies of this work is hereby granted.