Small, Fast S-Expression Library
Data Structures | Typedefs | Enumerations | Functions | Variables
sexp.h File Reference

API for a small, fast and portable s-expression parser library. More...

#include <stddef.h>
#include <stdio.h>
#include "faststack.h"
#include "cstring.h"
#include "sexp_memory.h"
#include "sexp_errors.h"
#include "sexp_ops.h"

Go to the source code of this file.

Data Structures

struct  elt
 
struct  parser_event_handlers
 
struct  pcont
 
struct  sexp_iowrap
 

Typedefs

typedef struct elt sexp_t
 
typedef struct parser_event_handlers parser_event_handlers_t
 
typedef struct pcont pcont_t
 
typedef struct sexp_iowrap sexp_iowrap_t
 

Enumerations

enum  elt_t { SEXP_VALUE , SEXP_LIST }
 
enum  atom_t { SEXP_BASIC , SEXP_SQUOTE , SEXP_DQUOTE , SEXP_BINARY }
 
enum  parsermode_t { PARSER_NORMAL , PARSER_INLINE_BINARY , PARSER_EVENTS_ONLY }
 

Functions

sexp_errcode_t set_parser_buffer_params (size_t ss, size_t gs)
 
sexp_tsexp_t_allocate (void)
 
void sexp_t_deallocate (sexp_t *s)
 
void sexp_cleanup (void)
 
int print_sexp (char *loc, size_t size, const sexp_t *e)
 
int print_sexp_cstr (CSTRING **s, const sexp_t *e, size_t ss)
 
sexp_tnew_sexp_list (sexp_t *l)
 
sexp_tnew_sexp_binary_atom (char *data, size_t binlength)
 
sexp_tnew_sexp_atom (const char *buf, size_t bs, atom_t aty)
 
pcont_tinit_continuation (char *str)
 
void destroy_continuation (pcont_t *pc)
 
sexp_iowrap_tinit_iowrap (int fd)
 
void destroy_iowrap (sexp_iowrap_t *iow)
 
sexp_tread_one_sexp (sexp_iowrap_t *iow)
 
sexp_tparse_sexp (char *s, size_t len)
 
sexp_tiparse_sexp (char *s, size_t len, pcont_t *cc)
 
pcont_tcparse_sexp (char *s, size_t len, pcont_t *pc)
 
void destroy_sexp (sexp_t *s)
 
void reset_sexp_errno ()
 
void print_pcont (pcont_t *pc, char *buf, size_t buflen)
 

Variables

sexp_errcode_t sexp_errno
 

Detailed Description

API for a small, fast and portable s-expression parser library.

Typedef Documentation

◆ parser_event_handlers_t

Some users would prefer to, instead of parsing a full string and walking a potentially huge sexp_t structure, use an XML SAX-style parser where events are triggered as certain parts of the s-expression are encountered. This structure contains a set of function pointers that are called by the parser as it hits expression start and end, and completes reading atoms and binary data. NOTE: The parser_event_handler struct that is a field in the continuation data structure is NOT freed by destroy_continuation since structs for callbacks are ALWAYS malloc'd by the user, not the library.

◆ pcont_t

typedef struct pcont pcont_t

A continuation is used by the parser to save and restore state between invocations to support partial parsing of strings. For example, if we pass the string "(foo bar)(goo car)" to the parser, we want to be able to retrieve each s-expression one at a time - it would be difficult to return all s-expressions at once without knowing how many there are in advance (this would require more memory management than we want...). So, by using a continuation-based parser, we can call it with this string and have it return a continuation when it has parsed the first s-expression. Once we have processed the s-expression (accessible through the last_sexpr field of the continuation), we can call the parser again with the same string and continuation, and it will be able to pick up where it left off.

We use continuations instead of a state-ful parser to allow multiple concurrent strings to be parsed by simply maintaining a set of continuations. Manipulating continuations by hand is required if the continuation-based parser is called directly. This is not recommended unless you are willing to deal with potential errors and are willing to learn exactly how the continuation relates to the internals of the parser. A simpler approach is to use either the parse_sexp function that simply returns an s-expression without exposing the continuations, or the iparse_sexp function that allows iteratively popping one s-expression at a time from a string containing one or more s-expressions. Refer to the documentation for each parsing function for further details on behavior and usage.

◆ sexp_t

typedef struct elt sexp_t

An s-expression is represented as a linked structure of elements, where each element is either an atom or list. An atom corresponds to a string, while a list corresponds to an s-expression. The following grammar represents our definition of an s-expression:

sexpr  ::= ( sx )
sx     ::= atom sxtail | sexpr sxtail | 'sexpr sxtail | 'atom sxtail | NULL
sxtail ::= sx | NULL
atom   ::= quoted | value
quoted ::= "ws_string"
value  ::= nws_string

<P<blockquote>

An atom can either be a quoted string, which is a string containing whitespace (possibly) surrounded by double quotes, or a non-whitespace string that does not require surrounding quotes. An element representing an atom will have a type of value and data stored in the val field. An element of type list represents an s-expression corresponding to sexpr in the grammar, and will have a pointer to the head of the appropriate s-expression. Details regarding these fields and their values given with the fields themselves. Notice that a single quote can appear directly before an s-expression or atom, similar to the use in LISP.

Enumeration Type Documentation

◆ atom_t

enum atom_t

For an element that represents a value, the value can be interpreted as a more specific type. A basic value is a simple string with no whitespace (and therefore no quotes required). A double quote value, or dquote, is one that contains characters (such as whitespace) that requires quotation marks to contain the string. A single quote value, or squote, represents an element that is prefaced with a single tick-mark. This can be either an atom or s-expression, and the result is that the parser does not attempt to parse the element following the tick mark. It is simply stored as text. This is similar to the meaning of a tick mark in the Scheme or LISP family of programming languages. Finally, binary allows raw binary to be stored within an atom. Note that if the binary type is used, the data is stored in bindata with the length in binlength. Otherwise, the data us stored in the val field with val_used and val_allocated tracking the size of the value string and the total memory allocated for it.

Enumerator
SEXP_BASIC 

Basic, unquoted value.

SEXP_SQUOTE 

Single quote (tick-mark) value - contains a string representing a non-parsed portion of the s-expression.

SEXP_DQUOTE 

Double-quoted string. Similar to a basic value, but potentially containing white-space.

SEXP_BINARY 

Binary data. This is used when the specialized parser is active and supports inlining of binary blobs of data inside an expression.

◆ elt_t

enum elt_t

An element in an s-expression can be one of three types: a value represents an atom with an associated text value. A list represents an s-expression, and the element contains a pointer to the head element of the associated s-expression.

Enumerator
SEXP_VALUE 

An atom of some type. See atom type (aty) field of element structure for details as to which atom type this is.

SEXP_LIST 

A list. This means the element points to an element representing the head of a list.

◆ parsermode_t

parser mode flag used by continuation to toggle special parser behaviour.

Enumerator
PARSER_NORMAL 

normal (LISP-style) s-expression parser behaviour.

PARSER_INLINE_BINARY 

treat atoms beginning with #b# as inlined binary data. everything else is treated the same as in PARSER_NORMAL mode.

PARSER_EVENTS_ONLY 

if the event_handlers field in the continuation contains a non-null value, the handlers specified in the parser_event_handlers_t struct will be called as appropriate, but the parser will not allocate a structure composed of sexp_t structs. Note that if the event_handlers is set to null and this mode is selected, the user would be better off not calling anything in the first place, as they are telling the parser to walk the string, but do nothing productive in the process.

Function Documentation

◆ destroy_continuation()

void destroy_continuation ( pcont_t pc)

destroy a continuation. This involves cleaning up what it contains, and cleaning up the continuation itself.

◆ destroy_sexp()

void destroy_sexp ( sexp_t s)

given a sexp_t structure, free the memory it uses (and recursively free the memory used by all sexp_t structures that it references). Note that this will call the deallocation routine for sexp_t elements. This means that memory isn't freed, but stored away in a cache of pre-allocated elements. This is an optimization to speed up the parser to eliminate wasteful free and re-malloc calls. Note: If using inlined binary mode, this will free the data pointed to by the bindata field. So, if you care about the data after the lifetime of the s-expression, make sure to make a copy before cleaning up the sexpr.

◆ init_continuation()

pcont_t* init_continuation ( char *  str)

create an initial continuation for parsing the given string

◆ new_sexp_atom()

sexp_t* new_sexp_atom ( const char *  buf,
size_t  bs,
atom_t  aty 
)

Allocate a new sexp_t element representing a value. The user must specify the precise type of the atom. This used to default to SEXP_BASIC, but this can lead to errors if the user did not expect this assumption. By explicitly passing in the atom type, the caller should ensure that the data in the buffer is valid given the requested atom type. For performance reasons, such checks are left to the caller if they are desired, and not performed in the library if they are not wanted.

◆ new_sexp_binary_atom()

sexp_t* new_sexp_binary_atom ( char *  data,
size_t  binlength 
)

Allocate a new sexp_t element representing a raw binary atom. This element will contain a pointer to the raw binary data provided, as well as the binary data length. The character atom fields will be NULL and the corresponding val length and allocation size will be set to zero since this element is carrying a binary pointer only.

◆ new_sexp_list()

sexp_t* new_sexp_list ( sexp_t l)

Allocate a new sexp_t element representing a list.

◆ print_pcont()

void print_pcont ( pcont_t pc,
char *  buf,
size_t  buflen 
)

print the contents of the parser continuation stack to a buffer. this is useful if an expression is partially parsed and the caller realizes that something is wrong with it. with this routine, the caller can reconstruct the expression parsed so far and use it for error reporting. this works with fixed size buffers allocated by the caller. there is not a CSTRING-based version currently.

◆ print_sexp()

int print_sexp ( char *  loc,
size_t  size,
const sexp_t e 
)

print a sexp_t struct as a string in the LISP style. If the buffer is large enough and the conversion is successful, the return value represents the length of the string contained in the buffer. If the buffer was too small, or some other error occurred, the return value is -1 and the contents of the buffer should not be assumed to contain any useful information. When the return value is -1, the caller should check the contents of sexp_errno for details on what error may have occurred.

◆ print_sexp_cstr()

int print_sexp_cstr ( CSTRING **  s,
const sexp_t e,
size_t  ss 
)

print a sexp_t structure to a buffer, growing it as necessary instead of relying on fixed size buffers like print_sexp. Important argument to tune for performance reasons is ss - the buffer start size. The growsize used by the CSTRING routines also should be considered for tuning via the sgrowsize() function. This routine no longer requires the user to specify the growsize, and uses the current setting without changing it.

◆ reset_sexp_errno()

void reset_sexp_errno ( )

reset the value of sexp_errno to SEXP_ERR_OK.

◆ sexp_cleanup()

void sexp_cleanup ( void  )

In the event that someone wants us to release ALL of the memory used between calls by the library, they can free it. If you don't call this, the caches will be persistent for the lifetime of the library user. Note that in the event of an error condition resulting in sexp_errno being set, the user might consider calling this to clean up any memory that may be lingering around that should be cleaned up.

◆ sexp_t_allocate()

sexp_t* sexp_t_allocate ( void  )

return an allocated sexp_t. This structure may be an already allocated one from the stack or a new one if none are available. Use this instead of manually mallocing if you want to avoid excessive mallocs. Note: Mallocing your own expressions is fine - you can even use sexp_t_deallocate to deallocate them and put them in the pool. Also, if the stack has not been initialized yet, this does so.

◆ sexp_t_deallocate()

void sexp_t_deallocate ( sexp_t s)

given a malloc'd sexp_t element, put it back into the already-allocated element stack. This method will allocate a stack if one has not been allocated already.

Variable Documentation

◆ sexp_errno

sexp_errcode_t sexp_errno
extern

Global value indicating the most recent error condition encountered. This value can be reset to SEXP_ERR_OK by calling sexp_errno_reset().