KHJK gotweb

Commits

Commit:: 1a2e9f3d5fb113a99ed000d3ef4af8cb931b7a87
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Sep 16 10:32:24 2022 UTC

comment formatting

diff | patch | tree

Commit:: ca2104da15f02288a34f02aa175312a2a4c779ef
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Thu Sep 15 11:34:38 2022 UTC

use structure assignment in act_ks_value This is a drive-by revert of a useless change in everyone's favorite commit, 6e5955c4 ("Most of the code folded in"). The line in question is a structure assignment, which is in C99 and behaves exactly as one would expect.

diff | patch | tree

Commit:: 6de503e15b1fd11418af24d8c6ade3348a315e8d
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Thu Sep 15 08:17:06 2022 UTC

disable loop.pdf test case for now While we generate an error message in parse_xrefs() for this case, parse_xrefs is not the right place to cause the parse error. That should be a semantic validation in the parser proper. Other checks in parse_xrefs can probably be moved completely out of the function in that vein, too. But we will leave doing that properly for another time. For now, let's put a comment to the effect in parse_xrefs and disable the failing test case by masking its .pdf extension.

diff | patch | tree

Commit:: ca863228bdda41111c7964d337736b9fc7a7d62f
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Thu Sep 15 08:07:02 2022 UTC

drop an XXX for later

diff | patch | tree

Commit:: 85bb11a7c042ec9c92c36f2b01c067eef624e572
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Wed Sep 14 15:38:58 2022 UTC

fix xref format in loop.pdf

diff | patch | tree

Commit:: ea6eb406debc0c4d47501a2d000a2679262df1d4
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Wed Sep 14 11:42:39 2022 UTC

add "invalid" test for missing newline before %%EOF

diff | patch | tree

Commit:: 93dd72780cde8009df2a5dfc10976093650d1eec
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Wed Sep 14 11:42:01 2022 UTC

add missing newline before %%EOF

diff | patch | tree

Commit:: e45fb9d92379b92b782ecbbb29dbf9a4a670ad9b
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Wed Sep 14 11:41:39 2022 UTC

fix /Length off-by-one

diff | patch | tree

Commit:: bc43d55e833052b8ec44371218233aa18b7e721a
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Wed Sep 14 11:41:05 2022 UTC

run tests in strict mode

diff | patch | tree

Commit:: 72c389d3e539e6d1b8586dab2e48de3c05cbb457
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 17:48:03 2022 UTC

Merge branch 'selfref' into ostream

diff | patch | tree

Commit:: f962b66b483de1dea4a80f6ed68c9f7d77123065
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 17:44:35 2022 UTC

remove unused global parser variables Not everything is needed outside of init_parser(). Also clear out some gratuitous whitespace.

diff | patch | tree

Commit:: da562f47c891cc2678b984fafdc886764783bf4b
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Tue Aug 9 19:27:42 2022 UTC

support cyclic references in resolve_item() Duplicates the previous commit. Note that we must pull the INVALID pointer out of resolve() so that resolve_item() can use the same one. Otherwise, the two function could confuse each others' INVALID pointers for valid objects.

diff | patch | tree

Commit:: 71bbc7963e1b73ae1b02ecc93aa3371e569b7958
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 17:43:30 2022 UTC

remove resolve_item and friends Finally. Remove these now-redundant functions.

diff | patch | tree

Commit:: b5568c0ce46c37f080fe97f45bfb67e9744e8fde
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Tue Aug 9 19:16:56 2022 UTC

fix resolve() for cyclic objects It is not entirely clear whether the spec allows cyclic object definitions such as the following: obj 1 0 1 0 R endobj There is an open errata issue about this topic, but no consensus has emerged so far. Most implementations will accept this, however, so for the time being I'm guessing we should, too. We will treat an object that is defined (directly or indirectly) as itself as equivalent to the null object. The implementation strategy is to give ourselves an distinct invalid pointer beside NULL and use it to mark the memoization entry for a given cross-reference (ent->obj) as INVALID while we recursively try to resolve it. If we eventually hit an INVALID object, we terminate the process and return NULL. The INVALID entry will internally stay in the memoization slot, but should never be returned by resolve(). This commit contains the implementation for resolve(). We'll do its unfortunate copy-paste sibling resolve_item() in the next one.

diff | patch | tree

Commit:: db58f4ce1094a2cbcba547a18458fb2431492595
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 17:42:21 2022 UTC

remove p_cstream (and kcontentstream) We can cover the single-stream case by doing what the multi-stream case does: Get the stream object, validate that its value type is TT_BYTES, and run p_textstream parser over those bytes from parse_pagenode. No need for a special version of p_objdef, kstream, or resolve for that matter.

diff | patch | tree

Commit:: 8993cef512f4545c0e407a733bae4fa4225f3989
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Tue Aug 9 19:02:14 2022 UTC

add a test case for cyclically defined objects This commit contains a failing test case. It contains a stream with a /Type entry that is an indirect reference to an object defined as itself: obj 8 0 8 0 R endobj The implementation of resolve() does not properly detect cycles and runs into an infinite loop.

diff | patch | tree

Commit:: 57347e732fd186b907726dd9a27fc2011390e145
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 17:02:59 2022 UTC

remove misplaced TT_ObjStm case from kcontentstream Similar to kbyteostream, kcontentstream is a specialized version of kstream that replaces the generic switch on /Type for the data parser. This version uses either p_textstream (the parser for content streams) or p_objstm__m. I think the latter case must have been the result of some confusion. Object streams are something completely different than content streams and cannot appear in place of one. That leaves kcontentstream identical to kbyteostream except that it uses p_textstream (parsing the stream data directly) instead of p_bytes (leaving the stream data to be parsed in parse_pagenode)...

diff | patch | tree

Commit:: 5010a4dc4459f9d3cc94dcf635285b317085b578
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Tue Aug 9 19:00:20 2022 UTC

support indirect references in streams' /Type entry I can't find the spec saying this has to be a direct object, so we have to call resolve() on the value.

diff | patch | tree

Commit:: a091668f694a14e7da6415d8056c770d72993198
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 16:49:12 2022 UTC

eliminate a bunch of useless gotos I don't know what the purpose of any of these was, it looks like they were put in as a matter of course just in case it would later turn out that some final cleanup code was needed. Do not write cruft code just in case.

diff | patch | tree

Commit:: 982ac17f53024b2288b07cd95e5c4c451d04ab19
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Sat Jul 30 16:53:49 2022 UTC

remove weird line endings/continuations What is this?

diff | patch | tree

Commit:: 20412cdab5e17485a4a49f22a7d4369c6558d46d
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 16:43:43 2022 UTC

use regular resolve (-> p_objdef) to get content stream fragments If we inspect p_byteostm, we see that it is nothing but a specialized form of p_objdef that replaces the object parser with byteostream which in turn is a specialized form of the stream parser that replaces the switch on /Type (in kstream/kbyteostream) with always using p_bytes, thus returning the stream data (after filters) as raw TT_BYTES. But kstream also treats an unrecognized or unspecified /Type with p_bytes. So, since a content stream should have none of the types recognized, we can just use p_objdef and thus resolve() here and eliminate that whole branch of copy paste. The only downside is that we're now allowing any object to appear where a stream should be (from the content parser's point of view). All we are missing though, is a proper token type for stream objects and a simple check in place of that XXX...

diff | patch | tree

Commit:: 70f1f0b8d4bf42dcfa38c9e9b61c8e5d2b4ade0a
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Sat Jul 30 16:52:36 2022 UTC

fix some indentation Come on.

diff | patch | tree

Commit:: e0350ca3bb91fb9ff9a91e048ab68ce059cd7f66
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 16:19:32 2022 UTC

use -X for text extraction including font diagnostics This removes the original use of outfn2/stream2 (corresponding to Xfile in main, i.e. the argument to -X) and reuses it for being verbose about fonts. This leaves -x as showing only the text and fixes our tests. It looks to me like the original outfn2 path is a less refined version of the outfn version, though it could be the other way around. Sumit told me that one of the two could go away at some point. Unfortunately, I cannot find the original message. We can readjust this as needed.

diff | patch | tree

Commit:: 4c8632507fb28092f7540248e372e0d1620fc1c5
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Sat Jul 30 16:49:32 2022 UTC

remove unneeded indirect parser h_indirect() is for when you need to refer to a parser before its definition.

diff | patch | tree

Commit:: 684a3c707589742d0776267e54d2bde5015cb62c
From:: Sven M. Hallberg <pesco@khjk.org>
Date:: Fri Aug 12 15:42:48 2022 UTC

swap the size/nmemb arguments to fwrite in text_extract This might be severely pedantic nit-picking, but we're not writing one element of size nchars, we're writing nchars elements of size 1, hmpf. Yes, nmemb is also a size_t. ;) NB: The main purpose of these arguments is to let fwrite check for overflow when multiplying them. So yes, technically the order almost certainly doesn't matter when one of them is 1. It does affect the return value (which is not checked here) in that it will report how many bytes were written in one case or just 0/1 in the other.

diff | patch | tree