Re: [NTG-pdftex] Re: Optimizing the generated pdf

18 Nov 2005

      On Thu, Nov 17, 2005 at 06:46:58PM +0100, Hartmut Henkel wrote:
...
On Thu, 17 Nov 2005, Hans Hagen wrote:
...
Martin  wrote:
...
On 2005-11-17 10:46:51 +0100, Martin Schrder wrote:
...
Btw: Is there a tool that compresses a pdf by replacing identical
objects with references?
pdfTeX could do this by itself: Store the md5 of the shortest n
objects (e.g. n = 1024) smaller then x bytes (e.g. x = 1024, longer
objects will typically be unique) and replace new identical objects
with references to the already existing ones.
i won't want to rely on md5 alone (shit happens). Finally one needs a
literal comparison.
I agree.
...
And when the object is gone, it's nasty to seek
around in the PDF file.
The position and length of the objects could be stored in memory.
...
...
...
This would e.g. condense all the obj <>
endobj in the pdfTeX manual. :-)
if it's enough to scan the last say 100 non-stream objects: this can be
done, at least it would catch these next to each other similar objects.
The matches of "similar" objects can be increased by normalization:
* Removal of unnecessary spaces.
* Ordering of dictionary keys.
* Normalization of strings and names.

Disadvantage: parsing of pdf objects would be necessary.
...
Maybe MD5 would be overkill, just a hash + comparison would be ok.
Yes.

Yours sincerely
  Heiko 
--

Re: [NTG-pdftex] Re: Optimizing the generated pdf

Heiko Oberdiek