wraptypes is a general utility for creating ctypes wrappers from C header files. The front-end is tools/wraptypes/wrap.py, for usage:
python tools/wraptypes/wrap.py -h
There are three components to wraptypes:
The front-end wrap.py provides a simple subclass of CtypesParser, CtypesWrapper, which writes the ctypes declarations found to a file in a format that can be imported as a module.
The parsers are built upon a modified version of PLY, a Python implementation of lex and yacc. The modified source is included in the wraptypes directory. The modifications are:
The first time the parsers are run (or after they are modified), PLY creates pptab.py and parsetab.py in the current directory. These are the generated state machines, which can take a few seconds to generate. The file parser.out is created if debugging is enabled, and contains the parser description (of the last parser that was generated), which is essential for debugging.
The grammar and parser are defined in preprocessor.py.
There is only one lexer state. Each token has a type which is a string (e.g. 'CHARACTER_CONSTANT') and a value. Token values, when read directly from the source file are only ever strings. When tokens are written to the output list they sometimes have tuple values (for example, a PP_DEFINE token on output).
Two lexer classes are defined: PreprocessorLexer, which reads a stack of files (actually strings) as input, and TokenListLexer, which reads from a list of already-parsed tokens (used for parsing expressions).
The preprocessing entry-point is the PreprocessorParser class. This creates a PreprocessorLexer and its grammar during construction. The system include path includes the GCC search path by default but can be modified by altering the include_path and framework_path lists. The system_headers dict allows header files to be implied on the search path that don’t exist. For example, by setting:
system_headers['stdlib.h'] = '''#ifndef STDLIB_H
#define STDLIB_H
/* ... */
#endif
'''
you can insert your own custom header in place of the one on the filesystem. This is useful when parsing headers from network locations.
Parsing begins when parse is called. Specify one or both of a filename and a string of data. If debug kwarg is True, syntax errors dump the parser state instead of just the line number where they occurred.
The production rules specify the actions; these are implemented in PreprocessorGrammar. The actions call methods on PreprocessorParser, such as:
The parser has a stack of ExecutionState, which specifies whether the current tokens being parsed are ignored or not (tokens are ignored in an #if that evaluates to 0). This is a little more complicated than just a boolean flag: the parser must also ignore #elif conditions that can have no effect. The enable_declaratives and enable_elif_conditionals return True if the top-most ExecutionState allows declaratives and #elif conditionals to be parsed, respecitively. The execution state stack is modified with the condition_* methods.
PreprocessorParser has a PreprocessorNamespace which keeps track of the currently defined macros. You can create and specify your own namespace, or use one that is created by default. The default namespace includes GCC platform macros needed for parsing system headers, and some of the STDC macros.
Macros are expanded when tokens are written to the output list, and when conditional expressions are parsed. PreprocessorNamespace.apply_macros(tokens) takes care of this, replacing function parameters, variable arguments, macro objects and (mostly) avoiding infinite recursion. It does not yet handle the # and ## operators, which are needed to parse the Windows system headers.
The process for evaluating a conditional (#if or #elif) is:
Because pyglet requires special knowledge of the preprocessor declaratives that were encountered in the source, these are encoded as pseudo-tokens within the output token list. For example, after a #ifndef is evaluated, it is written to the token list as a PP_IFNDEF token.
#define is handled specially. After applying it to the namespace, it is parsed as an expression immediately. This is allowed (and often expected) to fail. If it does not fail, a PP_DEFINE_CONSTANT token is created, and the value is the result of evaluatin the expression. Otherwise, a PP_DEFINE token is created, and the value is the string concatenation of the tokens defined. Special handling of parseable expressions makes it simple to later parse constants defined as, for example:
#define RED_SHIFT 8
#define RED_MASK (0x0f << RED_SHIFT)
The preprocessor can be tested/debugged by running preprocessor.py stand-alone with a header file as the sole argument. The resulting token list will be written to stdout.
The lexer for CParser, CLexer, takes as input a list of tokens output from the preprocessor. The special preprocessor tokens such as PP_DEFINE are intercepted here and handled immediately; hence they can appear anywhere in the source header file without causing problems with the parser. At this point IDENTIFIER tokens which are found to be the name of a defined type (the set of defined types is updated continuously during parsing) are converted to TYPE_NAME tokens.
The entry-point to parsing C source is the CParser class. This creates a preprocessor in its constructor, and defines some default types such as wchar_t and __int64_t. These can be disabled with kwargs.
Preprocessing can be quite time-consuming, especially on OS X where thousands of #include declaratives are processed when Carbon is parsed. To minimise the time required to parse similar (or the same, while debugging) header files, the token list from preprocessing is cached and reused where possible.
This is handled by CPreprocessorParser, which overrides push_file to check with CParser if the desired file is cached. The cache is checked against the file’s modification timestamp as well as a “memento” that describes the currently defined tokens. This is intended to avoid using a cached file that would otherwise be parsed differently due to the defined macros. It is by no means perfect; for example, it won’t pick up on a macro that has been defined differently. It seems to work well enough for the header files pyglet requires.
The header cache is saved and loaded automatically in the working directory as .header.cache. The cache should be deleted if you make changes to the preprocessor, or are experiencing cache errors (these are usually accompanied by a “what-the?” exclamation from the user).
The actions in the grammar construct parts of a “C object model” and call methods on CParser. The C object model is not at all complete, containing only what pyglet (and any other ctypes-wrapping application) requires. The classes in the object model are:
The ExpressionNode object hierarchy is similar to that used in the preprocessor, but more fully-featured, and using a different EvaluationContext which can evaluate identifiers and the sizeof operator (currently it actually just returns 0 for both).
Methods are called on CParser as declarations and preprocessor declaratives are parsed. The are mostly self explanatory. For example:
These methods should be overridden by a subclass to provide functionality. The DebugCParser does this and prints out the arguments to each handle_ method.
The CParser can be tested in isolation by running it stand-alone with the filename of a header as the sole argument. A DebugCParser will be constructed and used to parse the header.
CtypesParser is implemented in ctypesparser.py. It is a subclass of CParser and implements the handle_ methods to provide a more ctypes-friendly interpretation of the declarations.
To use, subclass and override the methods:
Types are represented by instances of CtypesType. This is more easily manipulated than a “real” ctypes type. There are subclasses for CtypesPointer, CtypesArray, CtypesFunction, and so on; see the module for details.
Each CtypesType class implements the visit method, which can be used, Visitor pattern style, to traverse the type hierarchy. Call the visit method of any type with an implementation of CtypesTypeVisitor: all pointers, array bases, function parameters and return types are traversed automatically (struct members are not, however).
This is useful when writing the contents of a struct or enum. Before writing a type declaration for a struct type (which would consist only of the struct’s tag), visit the type and handle the visit_struct method on the visitor to print out the struct’s members first. Similarly for enums.
ctypesparser.py can not be run stand-alone. wrap.py provides a straight-forward implementation that writes a module of ctypes wrappers. It can filter the output based on the originating filename. See the module docstring for usage and extension details.