By: Jose Pimentel – Amyuni Technologies
Cloud services such as Amazon S3, Google Cloud Storage, Google Docs, Dropbox, Box, Microsoft Azure and others are excellent repositories for document storage. Each of these services allows users to off-load document storage to the cloud. Although any type of document can be stored in the Cloud, it is often desirable to store all documents in the standard PDF format with all the benefits that come with using PDF. Converting documents to PDF and then uploading them to the Cloud is typically a two-step process that requires manual operations such as printing the document to a virtual printer and logging-in to web site to upload.
In this paper we will describe a solution for easily converting documents into PDF for storage in the Cloud in a single automated process.
Requirements (Server Side)
A Cloud service that exposes an API to allow developers to automate the tasks of storing and retrieving documents.
Taking a closer look at cloud services, we found a number of issues when trying to decide on a service to use.
– There are really no “free” evaluation services. Google Cloud Storage, Amazon S3 and Microsoft Azure services all require credit card information for evaluation.
– Google Docs does offer a free evaluation but there is a one month time period.
– Although both Dropbox and Box services offer free developer evaluation accounts, both of these applications require clients or users to first login to their website before file transfers occur. This breaks up the flow of the application.
In looking at the cloud storage services available we ended up focusing on Google Docs which is part of Google Apps for Business and Microsoft Azure (Blob Storage Service).
Requirements (Client Side)
When looking at the cloud storage printing solution we wanted to achieve:
– We needed a tool to convert all user documents into PDF before sending them to the Cloud
– We needed direct transfer or no client PC storage for safety and efficiency
– We needed a solution that is easy to implement and use
– We needed a high performance engine in order not to hinder the user’s workflow
The Amyuni PDF Converter responded to all our requirements. PDF Converter is a virtual printer driver that converts output sent to it from a printing application into PDF format. The printer driver which is certified by Microsoft for all 32 and 64bit editions of windows, offers the developer complete control over the printing process. The Amyuni PDF Converter product which is needed to run the sample application described in this paper can be downloaded from http://www.amyuni.com/en/enduser/pdfconverterend.
Implementation
The Amyuni PDF Converter printer driver API enables developers to intercept the datastream coming from the printer driver and handle it in their own custom application. The developer has the option to store the datastream locally, to a network drive or even to the cloud.
The intercepting of the datastream is accomplished by configuring the PDF Converter to call a custom DLL during the print job. This custom DLL will perform the “work” of uploading to the cloud.
For both Google and Azure cloud services the Amyuni PDF Converter is installed and configured in the same manner. The PDF Converter is first installed on a PC or Server, then using our API the developer will need to configure to direct the printer output to our custom DLL. The code snippet below illustrates this process.
//Declare object
CDIntfEx.CDIntfExClass PDF = new CDIntfEx.CDIntfExClass();
//Initialize printer
//This is printer name you used to install printer
PDF.DriverInit(“Amyuni PDF Cloud Converter”);
//This function needs to be called to enable printer
PDF.EnablePrinter(strLicenseTo, strActivationCode);
//The PDF Converter is configured to send data to a page processor DLL
PDF.FileNameOptionsEx = 0x2000000;
//This is the name of the DLL that will process datastream
PDF.SetPrinterParamStr(“PageProcessor”,“PageProc.dll”);
//Apply changes
PDF.SetDefaultConfig();
The PageProc DLL communicates with the acListener service and sends it the data stream that it receives from the printer driver. Although the DLL can be programmed to send the output directly to the cloud, we chose to use the intermediary acListener service because it gave us the following advantages:
– Control is quickly returned to the user before the data is fully uploaded to the server
– The listener service can authenticate the user only once whereas the DLL would need to authenticate the user each time it is loaded
– Previewing of the PDF document can be easily implemented in the listener by using the Amyuni PDF Creator.Net viewer prior to uploading the document. PDF Creator.Net is available for download from the following URL:
http://www.amyuni.com/en/developer/pdfcreator
Uploading to Google Docs
Google Docs provides the following advantages:
– Free 30 day business account access.
– Extremely Large user base
– Google Docs gives the user a visual representation of the documents.
– Online documents with real-time collaboration. Print from your PC and make document accessible to other users or accessible from other PCs.
In order to use the Google Docs (Google Documents List API), you will need to sign up for a Google Apps for Business account. Evaluating this service does not require a credit card but will require you adding a HTML tag to the index page of your website for authentication.
What distinguishes Google Docs from other services is that it offers the user a web interface to view their documents and collaborate with other users.
All of the uploading to Google Docs happens the Upload () method.
The code snippet below illustrates how the
//create menory stream of date generated by
//the PDF Converter printer.
MemoryStream data = _document.GetData();
This code snippet below, which is in the acListener service, handles the uploading of the databstream to the cloud.
public void Upload()
{
//////////////////////////////////////////////////////////////////
/*The DocumentsService class represents a client connection
* (with authentication) to the Google Docs web service.
Setting your application’s name
(in the form companyName-applicationName-versionID)*/
//////////////////////////////////////////////////////////////////
DocumentsService service = new
DocumentsService(“AmyuniTech-AmyuniCloudPrinterApp-v1.0”);
//RequestFactory to create a request for the particular query
//A request factory to generate an authorization header suitable
//for use with OAuth user authenticate through Google’s servers.
GDataGAuthRequestFactory reqFactory =
(GDataGAuthRequestFactory)service.RequestFactory;
//indicates if the connection should be kept alive
reqFactory.KeepAlive = false;
//Use v3 of the API
reqFactory.ProtocolMajor = 3;
service.setUserCredentials(_googleUsername , _googleUserPassword);
DocumentEntry entry = null;
//create menory stream of date generated by
//the PDF Converter printer.
MemoryStream data = _document.GetData();
//rewind the data
data.Seek(0, SeekOrigin.Begin);
try
{
//Tell Google that you are going to be sending a PDF document.
//_googleDocumentName is the name of the PDF file that will appear in Google docs.
String contentType = (String)DocumentsService.DocumentTypes[“PDF”];
entry = service.Insert(new Uri(DocumentsListQuery.documentsBaseUri),
data,
contentType,
_googleDocumentName) as DocumentEntry;
}
catch (Exception ex)
{
System.Windows.Forms.MessageBox.Show(ex.Message.ToString());
}
finally
{
}
_document.ReleaseStream();
data.Close();
}
Uploading to Microsoft Azure Blob Storage
The process of uploading to Microsoft Azure Blob Storage Service is similar.
The main advantage of Microsoft Blob Storage Service for developers is its extensive .NET documentation.
Microsoft Azure Blob Storage Service (MABSS) uses a container and blob concept to mimic the file system.
Microsoft defines a Container as “A container provides a grouping of a set of blobs. All blobs must be in a container. An account can contain an unlimited number of containers. A container can store an unlimited number of blobs.”
Microsoft defines a Blob as “file of any type and size”.
In the code snippet below the Amyuni PDF Converter is passing the datastream on the Blog object’s UploadFromStream() method to upload the file to the cloud.
// to blob on cloud
blob.UploadFromStream(data);
/// <summary>
/// This method uses a acPbpDocument object
/// It loads this object to the cloud.
/// </summary>
public void Upload()
{
//create memory stream of data generated by
//the PDF Converter printer.
MemoryStream data = _document.GetData();
//rewind the data
data.Seek(0, SeekOrigin.Begin);
try
{
CloudStorageAccount storageAccount =
CloudStorageAccount.Parse(ConfigurationManager.AppSettings[“StorageAccountConnectionString”]);
// Create the blob client
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container
//This is like a directory
CloudBlobContainer container = blobClient.GetContainerReference(“pdfdocuments”);
// Create the container if it doesn’t already exist
container.CreateIfNotExist();
// Retrieve reference to a blob – this is where PDF document is going to be upload
//- like filename
CloudBlob blob = container.GetBlobReference(_pdfDocumentName + “.pdf”);
// blob on cloud
blob.UploadFromStream(data);
}
catch (Exception ex)
{
System.Windows.Forms.MessageBox.Show(ex.Message);
}
finally
{
}
_document.ReleaseStream();
data.Close();
}
The acListener services for both Google Docs and Azure can be requested by emailing Jose at the email address provided below.
About the author:
Jose Pimentel is the Customer Service Manager at Amyuni Technologies. For more than 10 years, Jose has provided PDF developers with solutions tailored to their requirements and guided them in selecting and implementing the right PDF components. Jose has always been on the look for new technologies and ways to adapt Amyuni’s extensive PDF libraries to these technologies such as the burgeoning Cloud architectures. He can be reached by email at jose.pimentel@amyuni.com or by phone at 514-868-9226.
Comment