INTRODUCTION:
PDF is a file format developed by Adobe
Systems in the early 1990s as a way to share documents, text, images. This file
format is used for presenting documents which are independent of application,
operating system, hardware, and software. It is an open standard for electronic
document exchange maintained by the international organization for
standardization (ISO). When documents, forms, graphics, web pages are converted
into PDF format, and they appear as printed text. For reading PDF files you
need to install the free Adobe Reader Software. Once you have downloaded the
Reader, it will automatically start up whenever you want to look at a PDF file.
These files are especially useful for documents such as magazine articles,
product brochures, or flyers in which you want to preserve the original graphic
appearance online.
Evolution:
The PDF version 1.7 includes all the
functionality of its previous versions from 1.0 to 1.6 and some of the features
are removed by Adobe, which are not according to the ISO 3200-1 specifications.
Technology
PDF combines three technologies
together; the specification of these technologies is as follows:
· From Postscript page
description programming language, a subset is included for generating the
layout and graphics.
· Fonts are allowed to travel
with the documents by Font embedding/replacement system.
· Provides a Structure System
for storage to bind elements and any associated content into a single file,
with a data compression facility, wherever appropriate.
NOTE: Postscript is a page
description language that runs in an interpreter to generate an image. It can
handle standard features of programming as well as graphics and other commands
like if condition and loops.
File Structure
The file Structure of PDF determines how
objects are stored in a PDF file, how they are accessed, and updated. The
Structure is independent of Semantics of the objects. The PDF file contains
text with some binary data mixed in it. If you open a PDF file using a text
editor, you‘ll see the raw objects that define the Structure and content of the
documents. The PDF documents contain
eight basic types of objects:
- Boolean value (TRUE or FALSE)
- Numbers
- Strings
- Arrays (ordered objects)
- Names
- Dictionaries (objects indexed by names)
- Streams (for large amount of data)
- The Null value objects.
The objects included PDF files can be
direct or indirect. Indirect objects are numbered with an object number and a
generation number, e.g. “12 0 obj” then,
12 is the object reference number. Direct objects are the objects embedded in other
objects like; “12 0 R” to show the inclusion of previous object.
METADATA:
A
PDF file basically contains two types of Metadata.
· Document Information
Dictionary: It is asset of key value fields such
as author, title, subjects, creations, and update dates. This is stored in the
optional Info Trailer of the file and a small set of fields is defined that can
be extended with the additional text values, if required.
· Another, Extensible Metadata
Platform (XMP) to add XML standards based extensible metadata as in other file
formats. This allows metadata to attach illustrations, as well as the whole
documents.
SECURITY
PDF files are encrypted for security
purposes and digitally signed for providing authentication to the message. Adobe
has defined certain standards for providing security to the PDF files. There
are basically two different methods and two different passwords that can be
used in a PDF. A User Password encrypts the file and prevents opening it. The other
one is Owner Password, which specifies the operation that should be restricted
even when the document is decrypted. Operations include: printing, copying text
and graphics out of the documents, adding or modifying the documents. The User
Password encrypts the file and requires password cracking schemes to defeat its
security measures. The difficulty level of cracking depends on the strength of
password and encryption method used. The owner password does not encrypt the
files, instead relies on the client software to respect these restrictions and
is not fully secure. A number of third party tools are available for cracking
the password of PDF files and also online free services are available for the
same.
USAGE RIGHTS
For usage rights, signatures are used
that enables additional interactive features. These features are not available
by default in a particular PDF viewer application.
The signature is used to validate that
the permits have been granted by a authentic granting authority. It allows user
to:
- To save the PDF documents along the modified form.
- Import data files in FDF, XFDF and text format.
- Export data files in FDF and XFDF formats.
- Submit from data.
- Instantiate new pages from named page templates
- Apply digital signatures
- Create, modify, delete, copy, import, & export annotations
TECHNICAL ISSUES:
There are some technical issues related
to the PDF files, some of them are discussed below:
- Scanned Documents: PDF files created by scanning hard copy documents containing primarily text, do not have the same structure as a PDF file of the same document. The scanned copy document internally contains the picture of the document, with no information about the text. A good quality scanning often makes the document look like the native PDF file, but a poor quality scanning results in poor structure.
- Accessibility: PDF files can be created especially for disabled people. PDF files in 2014 can include XML tags, text equivalents captions, audio description, etc. The file can be magnified for the reader with visual impairments.
- Virus and Exploits: PDF file attachments carry viruses and it was first discovered in 2001.The virus named outlook.pdf worm, used Microsoft Outlook to send itself as an attachments to the PDF file. One way of avoiding PDF file exploits is to have a local or web service converted to another format before viewing.
- Usage Restriction and Monitoring: PDF files are encrypted and a password is needed to view or edit the content. The PDF references define both 40-bit and 128 bit encryption. Adobe provides a method to set security policies on specific documents.
REDACTION OF PDF FILES
Redaction is a form of editing in which
multiple source text are combined (redacted) and altered to make a single document.
Redacting a PDF file allows to keep your document’s formatting, while hiding sensitive
information. It can and should be, used to cover information such as; Social
Security Numbers, competitive information and even images.
PDF is preferred over most of the other
file formats for documenting and communicating, because it exhibits the following
benefits:
- . Reliability
- . Open standard
- . Trustworthy
- . Supports Multiplatform
- . Rich in file Integrity
- . Easily Accessible
No comments:
Post a Comment