Commit Briefs

1d12105938 Sven M. Hallberg

fix wrong indentation in act_viol (leakcheck)


2e272c3132 Sven M. Hallberg

free parse result ifdef LEAKCHECK

This covers the main parse result and a possible "error parse", but not the calls to h_parse() in filters and parse_obj().


7d36d3a94d Sven M. Hallberg

free the parse result from p_startxref


53d6518a00 Sven M. Hallberg

free parse result in act_viol


6a516036da Sven M. Hallberg

light style pass over act_viol


68108a4aa0 Sven M. Hallberg

statically allocate global lzw decoder context

Avoids the use of malloc(). Also factors out table initialization to a function lzw_init_table().


1afde767c4 Sven M. Hallberg

print an error message if /Root not found

If we are actually processing page content, that is.


f0c8a4732e Sven M. Hallberg

correctly look for /Root in the last trailer section

A mistake snuck into commit 76e546ce, taking the last element of the xrefs array as the "last" trailer section. But the array is filled in reverse order by following the chain of startxref and /Prev pointers, so the (logical) last/latest section is xrefs[0].


06ed0943b4 Sven M. Hallberg

fix format specifier for printing HBytes

Since HBytes is a length/pointer pair and not a null-terminated string, we must pass the length as an argument to printf. The correct format specifier for that is "%.*s" (string with "precision" = length), not "%*s" (string with minimum field width).


11e873cc86 Sven M. Hallberg

add missing printf argument

Forgotten in b3dda3fe when adding the input file name to error messages.


656f5a3f4d Sven M. Hallberg

remove stale comment

Finished reviewing past modifications to parse_xrefs(). NB: All code attributed to Sumit Ray has been removed from this function.


a1014f81d8 Sven M. Hallberg

improve handling of parse errors in xref stream data

Improve on the bugfix in commit a5abf1e2: - Reinstate the assert for 'res->ast != NULL'. If it fails, there is a bug in the parser, not an error in the input file. - Provide a distinct error message for the case where p_xref fails on a cross-reference stream because of invalid data. - Only skip storing the invalid section. Try to follow the /Prev entry in the stream dictionary to find more sections.


512de3c2ea Sven M. Hallberg

remove a comment

I cannot tell what this refers to. The (nonexistent) else case of the if statement above it is simply the case of the object number in question not falling within this subsection. Anyway, the function lookup_xref() is a low-level utility used during parsing, not a place to produce error messages.


c8be9e8432 Sven M. Hallberg

comments regarding act_ks_value

HParseResult was introduced in 6b54ebfa (generally parse stream objects) to hold the result of parsing the stream data, including the application of any filters. This is produced in act_ks_value(). The fact that parse errors in stream data are thus detectable is in fact significant for xref stream processing, so we should not just return the bare data on error.


e619663961 Sven M. Hallberg

adjust comments


b3dda3fe55 Sven M. Hallberg

don't emulate VIOL in error messages

While it might seem like a good idea to "grade" errors by severity, we are not *really* in any place to do so accurately. Our tasks are (a) to decide, internally, whether to print a message or silently ignore a malformation, and (b) to ultimately judge the file valid or invalid as a whole. Note that the latter part, as stated before, is not the responsibility of parse_xrefs(). Reinstate the input file name in these error messages. That information is useful when running the program on multiple files from a script, as we have been doing. While we're at it, fix style (line lengths).


9ff8c465fb Sven M. Hallberg

add test cases for out-of-bounds xref pointers

Both currently fail because the parser proper does not validate these offsets.


9196b5c2b8 Sven M. Hallberg

drop use of h_seek in parse_xrefs

Now that we are validating the offset ourselves, we no longer need h_seek() to do our bounds checking. But add a defensive assert just in case.


dd3c8e62ac Sven M. Hallberg

bounds-check /Prev pointers

Mirrors the check for startxref. I considered unifying the two into one test at the start of the loop, but then we would lose the information whether we got the offset from startxref or a /Prev.


aa40560780 Sven M. Hallberg

report location of invalid startxref

This is useful information, especially in hex, when looking into the file. The invalid value itself, on the other hand, is not so useful.


550c070d23 Sven M. Hallberg

adjust error message

The correct and standard format specifier for values of type size_t is %zu. There is no need to point out the valid bounds. Match style with the other messages.


431c7db3b7 Sven M. Hallberg

remove useless/erroneous condition

The offset can never be negative (size_t is unsigned). And this treated offset = 0 as out of bounds, which is nonsense. In fact, offset == size is also not invalid (it is the end of file).


9883a54368 Sven M. Hallberg

revert parse_xrefs to its original signature

Passing the aux struct by reference may look cleaner, but it was deliberate to keep parse_xrefs() independent of that struct, since the latter is conceptually part of the parser's interface and the former is not. Also, this way parse_xrefs() has a proper return value that signals success or failure. Plus, no ugly indirection or temporary variable is needed to access sz.


81dc4dbad2 Sven M. Hallberg

move parse_xrefs back next to main

Move parse_xrefs() back in its proper place as a helper to main(), including the definition of the global variable 'infile' with the rest of the command line arguments. It had been moved in fbbe953f when the content processing code was confusedly hooked into the function. Also removes marker comments about "Start/End xref parsing". The code between them is not exclusively concerned with xrefs and their sheer size clashes with the rest of the coding style.


91473d5f1f Sven M. Hallberg

restore single exit point in parse_xrefs

It turns out that this function was in fact meant to always assign a result (NULL/0 on failure), accomplished by having a single exit point. This was changed in 517b81ad for no reason. Reverting. I'm guessing the goto was considered disagreeable, so I'll explain the rationale. The function accumulates its result in the *local* variables xrefs and n. This mainly makes the code nicer to read than writing to the output directly. Having a single exit point, a property that is easy to verify, ensures that no update to the local variables can get lost, i.e. they serve as de-facto aliases for the outputs.