Commits
- Commit:
1a2e9f3d5fb113a99ed000d3ef4af8cb931b7a87
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
comment formatting
- Commit:
ca2104da15f02288a34f02aa175312a2a4c779ef
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
use structure assignment in act_ks_value
This is a drive-by revert of a useless change in everyone's favorite commit,
6e5955c4 ("Most of the code folded in"). The line in question is a structure
assignment, which is in C99 and behaves exactly as one would expect.
- Commit:
6de503e15b1fd11418af24d8c6ade3348a315e8d
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
disable loop.pdf test case for now
While we generate an error message in parse_xrefs() for this case, parse_xrefs
is not the right place to cause the parse error. That should be a semantic
validation in the parser proper. Other checks in parse_xrefs can probably be
moved completely out of the function in that vein, too.
But we will leave doing that properly for another time. For now, let's put a
comment to the effect in parse_xrefs and disable the failing test case by
masking its .pdf extension.
- Commit:
ca863228bdda41111c7964d337736b9fc7a7d62f
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
drop an XXX for later
- Commit:
85bb11a7c042ec9c92c36f2b01c067eef624e572
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
fix xref format in loop.pdf
- Commit:
ea6eb406debc0c4d47501a2d000a2679262df1d4
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
add "invalid" test for missing newline before %%EOF
- Commit:
93dd72780cde8009df2a5dfc10976093650d1eec
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
add missing newline before %%EOF
- Commit:
e45fb9d92379b92b782ecbbb29dbf9a4a670ad9b
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
fix /Length off-by-one
- Commit:
bc43d55e833052b8ec44371218233aa18b7e721a
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
run tests in strict mode
- Commit:
72c389d3e539e6d1b8586dab2e48de3c05cbb457
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
Merge branch 'selfref' into ostream
- Commit:
f962b66b483de1dea4a80f6ed68c9f7d77123065
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
remove unused global parser variables
Not everything is needed outside of init_parser().
Also clear out some gratuitous whitespace.
- Commit:
da562f47c891cc2678b984fafdc886764783bf4b
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
support cyclic references in resolve_item()
Duplicates the previous commit.
Note that we must pull the INVALID pointer out of resolve() so that
resolve_item() can use the same one. Otherwise, the two function could
confuse each others' INVALID pointers for valid objects.
- Commit:
71bbc7963e1b73ae1b02ecc93aa3371e569b7958
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
remove resolve_item and friends
Finally. Remove these now-redundant functions.
- Commit:
b5568c0ce46c37f080fe97f45bfb67e9744e8fde
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
fix resolve() for cyclic objects
It is not entirely clear whether the spec allows cyclic object
definitions such as the following:
obj 1 0
1 0 R
endobj
There is an open errata issue about this topic, but no consensus has
emerged so far. Most implementations will accept this, however, so for
the time being I'm guessing we should, too.
We will treat an object that is defined (directly or indirectly) as
itself as equivalent to the null object.
The implementation strategy is to give ourselves an distinct invalid
pointer beside NULL and use it to mark the memoization entry for a given
cross-reference (ent->obj) as INVALID while we recursively try to
resolve it. If we eventually hit an INVALID object, we terminate the
process and return NULL. The INVALID entry will internally stay in the
memoization slot, but should never be returned by resolve().
This commit contains the implementation for resolve(). We'll do its
unfortunate copy-paste sibling resolve_item() in the next one.
- Commit:
db58f4ce1094a2cbcba547a18458fb2431492595
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
remove p_cstream (and kcontentstream)
We can cover the single-stream case by doing what the multi-stream case
does: Get the stream object, validate that its value type is TT_BYTES,
and run p_textstream parser over those bytes from parse_pagenode. No
need for a special version of p_objdef, kstream, or resolve for that
matter.
- Commit:
8993cef512f4545c0e407a733bae4fa4225f3989
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
add a test case for cyclically defined objects
This commit contains a failing test case. It contains a stream with a
/Type entry that is an indirect reference to an object defined as
itself:
obj 8 0
8 0 R
endobj
The implementation of resolve() does not properly detect cycles and runs
into an infinite loop.
- Commit:
57347e732fd186b907726dd9a27fc2011390e145
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
remove misplaced TT_ObjStm case from kcontentstream
Similar to kbyteostream, kcontentstream is a specialized version of
kstream that replaces the generic switch on /Type for the data parser.
This version uses either p_textstream (the parser for content streams)
or p_objstm__m. I think the latter case must have been the result of
some confusion. Object streams are something completely different than
content streams and cannot appear in place of one.
That leaves kcontentstream identical to kbyteostream except that it uses
p_textstream (parsing the stream data directly) instead of p_bytes
(leaving the stream data to be parsed in parse_pagenode)...
- Commit:
5010a4dc4459f9d3cc94dcf635285b317085b578
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
support indirect references in streams' /Type entry
I can't find the spec saying this has to be a direct object, so we have
to call resolve() on the value.
- Commit:
a091668f694a14e7da6415d8056c770d72993198
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
eliminate a bunch of useless gotos
I don't know what the purpose of any of these was, it looks like they
were put in as a matter of course just in case it would later turn out
that some final cleanup code was needed. Do not write cruft code just in
case.
- Commit:
982ac17f53024b2288b07cd95e5c4c451d04ab19
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
remove weird line endings/continuations
What is this?
- Commit:
20412cdab5e17485a4a49f22a7d4369c6558d46d
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
use regular resolve (-> p_objdef) to get content stream fragments
If we inspect p_byteostm, we see that it is nothing but a specialized form of
p_objdef that replaces the object parser with byteostream which in turn is a
specialized form of the stream parser that replaces the switch on /Type (in
kstream/kbyteostream) with always using p_bytes, thus returning the stream data
(after filters) as raw TT_BYTES.
But kstream also treats an unrecognized or unspecified /Type with
p_bytes. So, since a content stream should have none of the types
recognized, we can just use p_objdef and thus resolve() here and
eliminate that whole branch of copy paste.
The only downside is that we're now allowing any object to appear where
a stream should be (from the content parser's point of view). All we are
missing though, is a proper token type for stream objects and a simple
check in place of that XXX...
- Commit:
70f1f0b8d4bf42dcfa38c9e9b61c8e5d2b4ade0a
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
fix some indentation
Come on.
- Commit:
e0350ca3bb91fb9ff9a91e048ab68ce059cd7f66
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
use -X for text extraction including font diagnostics
This removes the original use of outfn2/stream2 (corresponding to Xfile
in main, i.e. the argument to -X) and reuses it for being verbose about
fonts. This leaves -x as showing only the text and fixes our tests.
It looks to me like the original outfn2 path is a less refined version
of the outfn version, though it could be the other way around. Sumit
told me that one of the two could go away at some point. Unfortunately,
I cannot find the original message. We can readjust this as needed.
- Commit:
4c8632507fb28092f7540248e372e0d1620fc1c5
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
remove unneeded indirect parser
h_indirect() is for when you need to refer to a parser before its
definition.
- Commit:
684a3c707589742d0776267e54d2bde5015cb62c
- From:
- Sven M. Hallberg <pesco@khjk.org>
- Date:
swap the size/nmemb arguments to fwrite in text_extract
This might be severely pedantic nit-picking, but we're not writing one
element of size nchars, we're writing nchars elements of size 1, hmpf.
Yes, nmemb is also a size_t. ;)
NB: The main purpose of these arguments is to let fwrite check for
overflow when multiplying them. So yes, technically the order almost
certainly doesn't matter when one of them is 1. It does affect the
return value (which is not checked here) in that it will report how many
bytes were written in one case or just 0/1 in the other.