Tree
- Tree:
589cab55ed965a77867b7952962b46896612794f
- Date:
- Message:
- Fix segfault when `decode_stream` fails in xrefs In instigator-crashes/aux-xrefs-segfault an invalid flate-encoded stream was producing this behavior: inflate: invalid distance too far back (-3) parse error in stream (XRef) ../instigator-crashes/aux-xrefs-segfault: error parsing xref section at position 249939 (0x3d053) Program received signal SIGSEGV, Segmentation fault. 0x000055555555d91f in lookup_xref (aux=0x7fffffffdf60, nr=4, gen=0) at pdf.c:1249 1249 HCountedArray *subs = H_INDEX_SEQ(aux->xrefs[i], 0); What was happening was that `act_ks_value`, indirectly invoked by `parse_xrefs`, invoked `decode_stream`, which produced the "inflate:" message and returned NULL; so `act_ks_value` produced the "parse error in stream" message and returned an HParseResult of that NULL pointer. Higher up the stack `act_xrstm` packs this NULL pointer into element 0 of a new `h_sequence`. `parse_xrefs` was happily storing this `h_sequence` into `aux->xrefs[0]`, then blithely continuing to the next loop iteration, at which point it would report "error parsing xref section" and return back to main(). However, this did not abort parsing the file! main() was continuing on to attempt to parse the PDF file as a whole, but the first time the resulting parse tried to `lookup_xref`, that lookup would attempt to iterate over the xrefs section in the file, checking to see if the xref number belonged to any of them. The line of code above then segfaulted while attempting to assert that the NULL was actually a valid `h_sequence` pointer. So this patch simply prevents `parse_xrefs` from treating the failed xrefs section as valid. The result is that, as before, the parse exits shortly because it can't follow any xrefs — but now without segfaulting! inflate: invalid distance too far back (-3) parse error in stream (XRef) ../instigator-crashes/aux-xrefs-segfault: error parsing xref section at position 255242 (0x3e50a) VIOLATION[1]@433 (0x1b1): Missing endobj token (severity=1) ../instigator-crashes/aux-xrefs-segfault: no parse VIOLATION[1]@433 (0x1b1): Missing endobj token (severity=1) ../instigator-crashes/aux-xrefs-segfault: error after position 433 (0x1b1) [Inferior 1 (process 626584) exited with code 01]
README
Beginnings of a PDF parser in Hammer ==================================== - Currently needs a custom Hammer branch. You'll need to build against this: https://gitlab.special-circumstanc.es/pesco/hammer/tree/pdf For detailed build instructions, see README.md in that repository. - Help the default Makefile find Hammer $ ln -s ../hammer/src hammer # needed for building pdf, include files $ ln -s ../hammer/build/opt/src lib # needed for running pdf, to locate libhammer.so - Notes for 2020-04-27 release: The release branch has been tested to build with the 2020-04-27_RELEASE` branch located at https://gitlab.special-circumstanc.es/pesco/hammer/tree/2020-04-27_RELEASE - Build: $ pushd ../hammer; scons; popd # build Hammer $ make pdf - Usage: $ export LD_LIBRARY_PATH=./lib # see Troubleshooting section below to see if this is needed $ ldd ./pdf | grep libhammer # verify that libhammer.so was found $ ./pdf <filename> # place some test files in the t/ directory... $ make test - Troubleshooting: libhammer.so not found: If Hammer is not installed as a system library, ld may fail to locate libhammer.so. The quick fix for this is altering LD_LIBRARY_PATH before running pdf: $ export LD_LIBRARY_PATH=./lib $ make test The second solution is executing "scons install" when building Hammer, which will install it in ld's usual search path: $ pushd ../hammer; scons install; popd # ... Update ldconfig cache if needed $ make pdf $ make test - Evaluating test results: For every file in the t/ directory, the pdf parser is executed. On successful parse, a message of the following form is displayed: OK: t/<filename> In case of a non-fatal parse error, error messages may be displayed, but presence of the "OK" indicates pdf exited successfully. On a failed test run, only parse error messages are displayed. - Copyright: - pesco 2019,2020 - pompolic 2020 - Paul Vines 2020 - David Bryant (modified lzw-ab code) See LICENSE and lzw-ab-license.txt for full copyright and licensing notice.