Commit Briefs

1a2e9f3d5f Sven M. Hallberg

comment formatting (failing-tests)


ca2104da15 Sven M. Hallberg

use structure assignment in act_ks_value

This is a drive-by revert of a useless change in everyone's favorite commit, 6e5955c4 ("Most of the code folded in"). The line in question is a structure assignment, which is in C99 and behaves exactly as one would expect.


6de503e15b Sven M. Hallberg

disable loop.pdf test case for now

While we generate an error message in parse_xrefs() for this case, parse_xrefs is not the right place to cause the parse error. That should be a semantic validation in the parser proper. Other checks in parse_xrefs can probably be moved completely out of the function in that vein, too. But we will leave doing that properly for another time. For now, let's put a comment to the effect in parse_xrefs and disable the failing test case by masking its .pdf extension.


ca863228bd Sven M. Hallberg

drop an XXX for later


85bb11a7c0 Sven M. Hallberg

fix xref format in loop.pdf


ea6eb406de Sven M. Hallberg

add "invalid" test for missing newline before %%EOF


93dd72780c Sven M. Hallberg

add missing newline before %%EOF


e45fb9d923 Sven M. Hallberg

fix /Length off-by-one


bc43d55e83 Sven M. Hallberg

run tests in strict mode


72c389d3e5 Sven M. Hallberg

Merge branch 'selfref' into ostream


f962b66b48 Sven M. Hallberg

remove unused global parser variables

Not everything is needed outside of init_parser(). Also clear out some gratuitous whitespace.


da562f47c8 Sven M. Hallberg

support cyclic references in resolve_item()

Duplicates the previous commit. Note that we must pull the INVALID pointer out of resolve() so that resolve_item() can use the same one. Otherwise, the two function could confuse each others' INVALID pointers for valid objects.


71bbc7963e Sven M. Hallberg

remove resolve_item and friends

Finally. Remove these now-redundant functions.


b5568c0ce4 Sven M. Hallberg

fix resolve() for cyclic objects

It is not entirely clear whether the spec allows cyclic object definitions such as the following: obj 1 0 1 0 R endobj There is an open errata issue about this topic, but no consensus has emerged so far. Most implementations will accept this, however, so for the time being I'm guessing we should, too. We will treat an object that is defined (directly or indirectly) as itself as equivalent to the null object. The implementation strategy is to give ourselves an distinct invalid pointer beside NULL and use it to mark the memoization entry for a given cross-reference (ent->obj) as INVALID while we recursively try to resolve it. If we eventually hit an INVALID object, we terminate the process and return NULL. The INVALID entry will internally stay in the memoization slot, but should never be returned by resolve(). This commit contains the implementation for resolve(). We'll do its unfortunate copy-paste sibling resolve_item() in the next one.


db58f4ce10 Sven M. Hallberg

remove p_cstream (and kcontentstream)

We can cover the single-stream case by doing what the multi-stream case does: Get the stream object, validate that its value type is TT_BYTES, and run p_textstream parser over those bytes from parse_pagenode. No need for a special version of p_objdef, kstream, or resolve for that matter.


8993cef512 Sven M. Hallberg

add a test case for cyclically defined objects

This commit contains a failing test case. It contains a stream with a /Type entry that is an indirect reference to an object defined as itself: obj 8 0 8 0 R endobj The implementation of resolve() does not properly detect cycles and runs into an infinite loop.


57347e732f Sven M. Hallberg

remove misplaced TT_ObjStm case from kcontentstream

Similar to kbyteostream, kcontentstream is a specialized version of kstream that replaces the generic switch on /Type for the data parser. This version uses either p_textstream (the parser for content streams) or p_objstm__m. I think the latter case must have been the result of some confusion. Object streams are something completely different than content streams and cannot appear in place of one. That leaves kcontentstream identical to kbyteostream except that it uses p_textstream (parsing the stream data directly) instead of p_bytes (leaving the stream data to be parsed in parse_pagenode)...


5010a4dc44 Sven M. Hallberg

support indirect references in streams' /Type entry

I can't find the spec saying this has to be a direct object, so we have to call resolve() on the value.


a091668f69 Sven M. Hallberg

eliminate a bunch of useless gotos

I don't know what the purpose of any of these was, it looks like they were put in as a matter of course just in case it would later turn out that some final cleanup code was needed. Do not write cruft code just in case.


982ac17f53 Sven M. Hallberg

remove weird line endings/continuations

What is this?


20412cdab5 Sven M. Hallberg

use regular resolve (-> p_objdef) to get content stream fragments

If we inspect p_byteostm, we see that it is nothing but a specialized form of p_objdef that replaces the object parser with byteostream which in turn is a specialized form of the stream parser that replaces the switch on /Type (in kstream/kbyteostream) with always using p_bytes, thus returning the stream data (after filters) as raw TT_BYTES. But kstream also treats an unrecognized or unspecified /Type with p_bytes. So, since a content stream should have none of the types recognized, we can just use p_objdef and thus resolve() here and eliminate that whole branch of copy paste. The only downside is that we're now allowing any object to appear where a stream should be (from the content parser's point of view). All we are missing though, is a proper token type for stream objects and a simple check in place of that XXX...


70f1f0b8d4 Sven M. Hallberg

fix some indentation

Come on.


e0350ca3bb Sven M. Hallberg

use -X for text extraction including font diagnostics

This removes the original use of outfn2/stream2 (corresponding to Xfile in main, i.e. the argument to -X) and reuses it for being verbose about fonts. This leaves -x as showing only the text and fixes our tests. It looks to me like the original outfn2 path is a less refined version of the outfn version, though it could be the other way around. Sumit told me that one of the two could go away at some point. Unfortunately, I cannot find the original message. We can readjust this as needed.


4c8632507f Sven M. Hallberg

remove unneeded indirect parser

h_indirect() is for when you need to refer to a parser before its definition.


684a3c7075 Sven M. Hallberg

swap the size/nmemb arguments to fwrite in text_extract

This might be severely pedantic nit-picking, but we're not writing one element of size nchars, we're writing nchars elements of size 1, hmpf. Yes, nmemb is also a size_t. ;) NB: The main purpose of these arguments is to let fwrite check for overflow when multiplying them. So yes, technically the order almost certainly doesn't matter when one of them is 1. It does affect the return value (which is not checked here) in that it will report how many bytes were written in one case or just 0/1 in the other.