On Thu, Nov 17, 2005 at 06:46:58PM +0100, Hartmut Henkel wrote:
On Thu, 17 Nov 2005, Hans Hagen wrote:
Martin wrote:
On 2005-11-17 10:46:51 +0100, Martin Schrder wrote:
Btw: Is there a tool that compresses a pdf by replacing identical objects with references?
pdfTeX could do this by itself: Store the md5 of the shortest n objects (e.g. n = 1024) smaller then x bytes (e.g. x = 1024, longer objects will typically be unique) and replace new identical objects with references to the already existing ones.
i won't want to rely on md5 alone (shit happens). Finally one needs a literal comparison.
I agree.
And when the object is gone, it's nasty to seek around in the PDF file.
The position and length of the objects could be stored in memory.
This would e.g. condense all the obj <> endobj in the pdfTeX manual. :-)
if it's enough to scan the last say 100 non-stream objects: this can be done, at least it would catch these next to each other similar objects.
The matches of "similar" objects can be increased by normalization: * Removal of unnecessary spaces. * Ordering of dictionary keys. * Normalization of strings and names. Disadvantage: parsing of pdf objects would be necessary.
Maybe MD5 would be overkill, just a hash + comparison would be ok.
Yes.
Yours sincerely
Heiko