sexp.h File Reference

API for a small, fast and portable s-expression parser library. More...

#include <stddef.h>
#include <stdio.h>
#include "faststack.h"
#include "cstring.h"
#include "sexp_memory.h"
#include "sexp_errors.h"
#include "sexp_ops.h"

Go to the source code of this file.

Data Structures

struct  elt
struct  parser_event_handlers
struct  pcont
struct  sexp_iowrap


typedef elt sexp_t
typedef parser_event_handlers parser_event_handlers_t
typedef pcont pcont_t
typedef sexp_iowrap sexp_iowrap_t


enum  elt_t { SEXP_VALUE, SEXP_LIST }


sexp_errcode_t set_parser_buffer_params (size_t ss, size_t gs)
sexp_tsexp_t_allocate (void)
void sexp_t_deallocate (sexp_t *s)
void sexp_cleanup (void)
int print_sexp (char *loc, size_t size, const sexp_t *e)
int print_sexp_cstr (CSTRING **s, const sexp_t *e, size_t ss)
sexp_tnew_sexp_list (sexp_t *l)
sexp_tnew_sexp_atom (const char *buf, size_t bs, atom_t aty)
pcont_tinit_continuation (char *str)
void destroy_continuation (pcont_t *pc)
sexp_iowrap_tinit_iowrap (int fd)
void destroy_iowrap (sexp_iowrap_t *iow)
sexp_tread_one_sexp (sexp_iowrap_t *iow)
sexp_tparse_sexp (char *s, size_t len)
sexp_tiparse_sexp (char *s, size_t len, pcont_t *cc)
pcont_tcparse_sexp (char *s, size_t len, pcont_t *pc)
void destroy_sexp (sexp_t *s)
void reset_sexp_errno ()


sexp_errcode_t sexp_errno

Detailed Description

API for a small, fast and portable s-expression parser library.

Typedef Documentation

typedef struct parser_event_handlers parser_event_handlers_t

Some users would prefer to, instead of parsing a full string and walking a potentially huge sexp_t structure, use an XML SAX-style parser where events are triggered as certain parts of the s-expression are encountered. This structure contains a set of function pointers that are called by the parser as it hits expression start and end, and completes reading atoms and binary data. NOTE: The parser_event_handler struct that is a field in the continuation data structure is NOT freed by destroy_continuation since structs for callbacks are ALWAYS malloc'd by the user, not the library.

typedef struct pcont pcont_t

A continuation is used by the parser to save and restore state between invocations to support partial parsing of strings. For example, if we pass the string "(foo bar)(goo car)" to the parser, we want to be able to retrieve each s-expression one at a time - it would be difficult to return all s-expressions at once without knowing how many there are in advance (this would require more memory management than we want...). So, by using a continuation-based parser, we can call it with this string and have it return a continuation when it has parsed the first s-expression. Once we have processed the s-expression (accessible through the last_sexpr field of the continuation), we can call the parser again with the same string and continuation, and it will be able to pick up where it left off.

We use continuations instead of a state-ful parser to allow multiple concurrent strings to be parsed by simply maintaining a set of continuations. Manipulating continuations by hand is required if the continuation-based parser is called directly. This is not recommended unless you are willing to deal with potential errors and are willing to learn exactly how the continuation relates to the internals of the parser. A simpler approach is to use either the parse_sexp function that simply returns an s-expression without exposing the continuations, or the iparse_sexp function that allows iteratively popping one s-expression at a time from a string containing one or more s-expressions. Refer to the documentation for each parsing function for further details on behavior and usage.

typedef struct elt sexp_t

An s-expression is represented as a linked structure of elements, where each element is either an atom or list. An atom corresponds to a string, while a list corresponds to an s-expression. The following grammar represents our definition of an s-expression:

 sexpr  ::= ( sx )
 sx     ::= atom sxtail | sexpr sxtail | 'sexpr sxtail | 'atom sxtail | NULL
 sxtail ::= sx | NULL
 atom   ::= quoted | value
 quoted ::= "ws_string"
 value  ::= nws_string

An atom can either be a quoted string, which is a string containing whitespace (possibly) surrounded by double quotes, or a non-whitespace string that does not require surrounding quotes. An element representing an atom will have a type of value and data stored in the val field. An element of type list represents an s-expression corresponding to sexpr in the grammar, and will have a pointer to the head of the appropriate s-expression. Details regarding these fields and their values given with the fields themselves. Notice that a single quote can appear directly before an s-expression or atom, similar to the use in LISP.

Enumeration Type Documentation

enum atom_t

For an element that represents a value, the value can be interpreted as a more specific type. A basic value is a simple string with no whitespace (and therefore no quotes required). A double quote value, or dquote, is one that contains characters (such as whitespace) that requires quotation marks to contain the string. A single quote value, or squote, represents an element that is prefaced with a single tick-mark. This can be either an atom or s-expression, and the result is that the parser does not attempt to parse the element following the tick mark. It is simply stored as text. This is similar to the meaning of a tick mark in the Scheme or LISP family of programming languages. Finally, binary allows raw binary to be stored within an atom. Note that if the binary type is used, the data is stored in bindata with the length in binlength. Otherwise, the data us stored in the val field with val_used and val_allocated tracking the size of the value string and the total memory allocated for it.

SEXP_BASIC  Basic, unquoted value.
SEXP_SQUOTE  Single quote (tick-mark) value - contains a string representing a non-parsed portion of the s-expression.
SEXP_DQUOTE  Double-quoted string. Similar to a basic value, but potentially containing white-space.
SEXP_BINARY  Binary data. This is used when the specialized parser is active and supports inlining of binary blobs of data inside an expression.

enum elt_t

An element in an s-expression can be one of three types: a value represents an atom with an associated text value. A list represents an s-expression, and the element contains a pointer to the head element of the associated s-expression.

SEXP_VALUE  An atom of some type. See atom type (aty) field of element structure for details as to which atom type this is.
SEXP_LIST  A list. This means the element points to an element representing the head of a list.

enum parsermode_t

parser mode flag used by continuation to toggle special parser behaviour.

PARSER_NORMAL  normal (LISP-style) s-expression parser behaviour.
PARSER_INLINE_BINARY  treat atoms beginning with #b# as inlined binary data. everything else is treated the same as in PARSER_NORMAL mode.
PARSER_EVENTS_ONLY  if the event_handlers field in the continuation contains a non-null value, the handlers specified in the parser_event_handlers_t struct will be called as appropriate, but the parser will not allocate a structure composed of sexp_t structs. Note that if the event_handlers is set to null and this mode is selected, the user would be better off not calling anything in the first place, as they are telling the parser to walk the string, but do nothing productive in the process.

Function Documentation

void destroy_continuation pcont_t pc  ) 

destroy a continuation. This involves cleaning up what it contains, and cleaning up the continuation itself.

void destroy_sexp sexp_t s  ) 

given a sexp_t structure, free the memory it uses (and recursively free the memory used by all sexp_t structures that it references). Note that this will call the deallocation routine for sexp_t elements. This means that memory isn't freed, but stored away in a cache of pre-allocated elements. This is an optimization to speed up the parser to eliminate wasteful free and re-malloc calls. Note: If using inlined binary mode, this will free the data pointed to by the bindata field. So, if you care about the data after the lifetime of the s-expression, make sure to make a copy before cleaning up the sexpr.

pcont_t* init_continuation char *  str  ) 

create an initial continuation for parsing the given string

sexp_t* new_sexp_atom const char *  buf,
size_t  bs,
atom_t  aty

Allocate a new sexp_t element representing a value. The user must specify the precise type of the atom. This used to default to SEXP_BASIC, but this can lead to errors if the user did not expect this assumption. By explicitly passing in the atom type, the caller should ensure that the data in the buffer is valid given the requested atom type. For performance reasons, such checks are left to the caller if they are desired, and not performed in the library if they are not wanted.

sexp_t* new_sexp_list sexp_t l  ) 

Allocate a new sexp_t element representing a list.

int print_sexp char *  loc,
size_t  size,
const sexp_t e

print a sexp_t struct as a string in the LISP style. If the buffer is large enough and the conversion is successful, the return value represents the length of the string contained in the buffer. If the buffer was too small, or some other error occurred, the return value is -1 and the contents of the buffer should not be assumed to contain any useful information. When the return value is -1, the caller should check the contents of sexp_errno for details on what error may have occurred.

int print_sexp_cstr CSTRING **  s,
const sexp_t e,
size_t  ss

print a sexp_t structure to a buffer, growing it as necessary instead of relying on fixed size buffers like print_sexp. Important argument to tune for performance reasons is ss - the buffer start size. The growsize used by the CSTRING routines also should be considered for tuning via the sgrowsize() function. This routine no longer requires the user to specify the growsize, and uses the current setting without changing it.

void reset_sexp_errno  ) 

reset the value of sexp_errno to SEXP_ERR_OK.

void sexp_cleanup void   ) 

In the event that someone wants us to release ALL of the memory used between calls by the library, they can free it. If you don't call this, the caches will be persistent for the lifetime of the library user. Note that in the event of an error condition resulting in sexp_errno being set, the user might consider calling this to clean up any memory that may be lingering around that should be cleaned up.

sexp_t* sexp_t_allocate void   ) 

return an allocated sexp_t. This structure may be an already allocated one from the stack or a new one if none are available. Use this instead of manually mallocing if you want to avoid excessive mallocs. Note: Mallocing your own expressions is fine - you can even use sexp_t_deallocate to deallocate them and put them in the pool. Also, if the stack has not been initialized yet, this does so.

void sexp_t_deallocate sexp_t s  ) 

given a malloc'd sexp_t element, put it back into the already-allocated element stack. This method will allocate a stack if one has not been allocated already.

Variable Documentation

sexp_errcode_t sexp_errno

Global value indicating the most recent error condition encountered. This value can be reset to SEXP_ERR_OK by calling sexp_errno_reset().

Generated on Thu Oct 25 01:19:37 2007 for Small, Fast S-Expression Library by  doxygen 1.4.6