SocketTools Compression and Encoding

In addition to components and libraries to access Internet services, SocketTools also includes a general-purpose library your applications can use to easily compress, encode and encrypt data. We have already covered support for data encryption, and this article will discuss the encoding and compression components.

Data Compression

SocketTools includes several high-level APIs and components which can be used to compress and encode data. In the Library Edition, these are part of the Encoding and Compression API. With the .NET Edition and ActiveX Edition, they are part of the SocketTools.FileEncoder component.

You can compress either a block of memory, or the entire contents of a file. Using the Library Edition, the functions you would use are CompressBuffer to compress a block a memory, and CompressFile to compress the contents of a file. There are corresponding ExpandBuffer and ExpandFile functions that enable you to reverse the process and restore the contents of a compressed buffer or file.

If you are using the .NET class or ActiveX control, the methods you would use are CompressData and CompressFile, and similarly, ExpandData and ExpandFile. It’s important to note you should always handle compressed data as a stream of bytes. If you use String variables, you can get unexpected results because the resulting output will contain binary data, including embedded null characters. It’s always preferable to use Byte arrays when compressing data in memory.

Internally, SocketTools supports several different compression types, but it defaults to using the “Deflate” algorithm, and is a de facto standard used by many applications to compress data. It achieves a good compression ratio while not using an excessive amount of memory or CPU cycles.

If you are also using the AES encryption functions in SocketTools, you should always compress the original data first, and then encrypt it. Typically, the compression ratio for encrypted data is substantially worse, particularly if the original data primarily consists of text.

It’s important to keep in mind that it’s generally not beneficial to use these functions with data that has already been compressed. For example, the DOCX format used by word processing applications is already in a compressed form. Attempting to compress it again using CompressFile will have little or no benefit and may increase the size of the file under some circumstances.

Data Encoding

In addition to the data compression functions in SocketTools, there are also functions which can be used to encode and decode data and text. There are two general encoding types supported by SocketTools: base64 encoding and quoted-printable encoding.

The standard for encoding today is base64, which is an algorithm that converts any type of data (text or binary) to a sequence of printable ASCII characters. The algorithm transforms data using radix-64 where each base64 digit represents six bytes of data. This means every three bytes of text or data can be represented as four base64 digits.

Quoted-printable encoding is used with text and can be found in e-mail messages which primarily consist of ASCII text, but may contain some non-ASCII characters, such as letters with diacritical marks. To ensure the text can be sent safely through a variety of mail servers, quoted-printable encoding is used to represent those non-ASCII characters as a sequence of ASCII characters.

Today, quoted-printable encoding is less common than it used to be. However, legacy applications may still generate text which uses quoted-printable encoding. While base64 encoded data is not human-readable, most quoted-printable encoded text is readable even without being decoded, since only non-ASCII characters are encoded.

Using the Library Edition, the functions EncodeBuffer and DecodeBuffer can be used to encode and decode data in memory, and the EncodeFile and DecodeFile functions can be used to encode and decode complete files.

Using the .NET class or ActiveX control, the EncodeData and DecodeData methods can be used with data in memory, and the EncodeFile and DecodeFile methods can be used with files.

Although encoding a string in memory or encoding the contents of a text file will make it unreadable, it is not a form of encryption. As the saying goes, security through obscurity is no security at all. Decoding previously encoded text is trivial and offers no real protection.

If you want to protect the contents of a string, use the AesEncryptString function in the Library API or the EncryptedText properties in the .NET class or ActiveX control. They will encrypt the data using AES-256 and automatically encode it using base64 so that it can be safely stored as an ASCII string.

One final consideration. If you have encoded text which was generated by an e-mail application, it’s possible the text was encoded using a different character set than the one you use in your locale. The decoding functions in the Encoding and Compression API will only decode text and data as-is, and it is not locale specific. This means if you have some encoded text which was written in a different locale, the decoded text may appear corrupted.

To address this issue, our Mail Message (MIME) components have special functions which enable you to easily decode and convert the encoded text into its proper Unicode representation.

In the Library Edition, the MimeDecodeTextEx function allows you to specify the original character that was used when the text was encoded. The SocketTools.MailMesage class and ActiveX control have DecodeText methods which perform the same function.

The SocketTools Encoding and Compression components can be combined with any of the other components in SocketTools, making it easy to add this functionality to your software with minimal coding and a simple interface.