Beginnings of a PDF parser in Hammer
====================================
BUILDING
Simply call 'make' in the top level directory.
$ make
The environment variables CC, CFLAGS, and LDFLAGS can be used in the usual
way to control the compiler to use, compiler flags, and linker flags,
respectively.
This program uses the Hammer parser combinator library. It needs a recent
version, which can be obtained from:
https://gitlab.special-circumstanc.es/hammer/hammer/
See the file README.md in that repository for build/install instructions.
It is recommended to install Hammer as a system library. See also the
TROUBLESHOOTING section below.
USAGE
./pdf [-qsv] [-x txtfile] input.pdf
The 'pdf' utility attempts to parse and validate the given PDF file. If
successful, it prints the resulting AST to stdout using a JSON format.
It exits 0 on success, 1 if the input file was found to be invalid, and >1
if an error occurs.
The options are as follows:
-q Query/quiet mode. Do not print to stdout and suppress any messages
about parse errors. Just indicate success/failure via the exit status.
-s Strict mode. Treat most format violations as parse errors.
-v Verbose mode. Show additional informational messages.
-x txtfile
Extract the text content of the input document and write it as plain
text to 'txtfile'.
TROUBLESHOOTING
<hammer/hammer.h> or libhammer.so not found:
If Hammer is not installed as a system library or in a nonstandard
location, cc and ld will fail to locate its headers and library. The
quick fix for this is to create symlinks called 'hammer' and 'lib'
pointing to Hammer's source and build output directories, respectively:
$ ln -s ../hammer/src hammer
$ ln -s ../hammer/build/opt/src lib
$ make
Likewise, when running 'pdf' directly, ld.so will fail to locate
libhammer.so. The quick fix is to point LD_LIBRARY_PATH to the 'lib' dir:
$ export LD_LIBRARY_PATH=$PWD/lib
$ ./pdf <filename>
EXIT STATUS
On valid input, the program returns with exit code 0. An exit code of 1
indicates that the parser identified the input as invalid and otherwise
executed normally. Exit codes >1 indicate abnormal termination, i.e.
program failure with indeterminate result.
EVALUATING TEST RESULTS
A suite of example files is provided in the test/ directory. To run the
test suite:
$ make test
For every file in the test/valid/ and test/invalid/ subdirectories, the pdf
parser is invoked.
For the valid samples, a message of the following form is displayed on a
successful parse (exit code 0):
OK: test/valid/<filename>
Non-fatal messages may be displayed above it, but presence of the "OK"
indicates that the test passed. On any nonzero exit, i.e. if either the
file is deemed invalid or the program encountered an unexpected error,
error messages are displayed above an indication of the following form
that includes the exact exit code:
FAIL (exit <n>): test/valid/<filename>
For the invalid samples, messages about parse errors are suppressed and an
"OK" is displayed if and only if pdf exits with 1 ("invalid input"). An
exit code of 0 or abnormal termination will produce the "FAIL" message with
any program output appearing above it.
COPYRIGHT
Various authors. Released under the terms of the ISC license.
See LICENSE for full copyright and licensing notice.