EBCDIC Encoding with .NET (zz)
EBCDIC Encoding with .NET
After reading a post on the C# newsgroup asking for a EBCDIC to ASCII converter, and seeing one solution, I decided to write my own implementation. This page describes the implementation and its limitations, and a bit about EBCDIC itself.
EBCDIC
Unfortunately it appears to be fairly tricky to get hold of many concrete specifications of EBCDIC. This is what I've managed to glean from various websites:
If you have any more information, particularly about the DBCS aspect, please mail me at [email protected].
My EBCDIC Encoding implementation
I managed to get hold of details of 47 EBCDIC encodings from http://std.dkuug.dk/i18n/charmaps/. To be honest, I don't really know what DKUUG is, so I'm really just hoping that the maps are accurate - they seem to be quite reasonable though. Each encoding has a name and several have aliases, although I currently ignore this aliasing.
My implementation consists of three projects, described below, of which only the middle one is of any interest to most people.
A character map reader
This simply finds all of the files whose names begin with "EBCDIC-"in the current directory, reads them all in (warning of any oddities in the encoding, such as any non-zero byte having two distinct meanings) and writes a resource file out,
ebcdic.dat
. This is a console applicion built from a single C# source file. An encoding library
This is a library built from two C# source files and the
ebcdic.dat
file generated by the reader. This library is all most users will need. More details are provided below. A test program
This is a console application built from a single C# source file and requiring the library described above. Currently it just displays the encoded version of "hello"and then decodes it.
Using The Encoding Library
The encoding library is very simple to use, as the encoding class (
JonSkeet.Ebcdic.EbcdicEncoding
) is a subclass of the standard .NET System.Text.Encoding
class. To obtain an instance of the appropriate encoding, use EbcdicEncoding.GetEncoding (String)
passing it the name of the encoding you wish to use (eg EBCDIC-US
). You can find out the list of names of available encodings using the EbcdicEncoding.AllNames
property, which returns the names as an array of strings. Once you have obtained an
EbcdicEncoding
instance, use it like any other Encoding
: call GetString
, GetBytes
etc. The encoding does not save any state between requests, and can safely be used by many threads simultaneously. There is no need (or indeed facility) to release encoding resources when it is no longer needed. All encodings are created on the first use of the EbcdicEncoding
class, and maintained until the application domain is unloaded. Sample Code
The following is a sample program to convert a file from EBCDIC-US to ASCII. It should be easy to see how to modify it to convert the other way, or to use a different encoding (eg from EBCDIC-UK, or to UTF-8).
using System;
using System.IO;
using System.Text;
using JonSkeet.Ebcdic;
public class ConvertFile
{
public static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine
("Usage: ConvertFile <ebcdic file (input)> <ascii file (output)>");
return;
}
string inputFile = args[0];
string outputFile = args[1];
Encoding inputEncoding = EbcdicEncoding.GetEncoding ("EBCDIC-US");
Encoding outputEncoding = Encoding.ASCII;
try
{
// Create the reader and writer with appropriate encodings.
using (StreamReader inputReader =
new StreamReader (inputFile, inputEncoding))
{
using (StreamWriter outputWriter =
new StreamWriter (outputFile, false, outputEncoding))
{
// Create an 8K-char buffer
char[] buffer = new char[8192];
int len=0;
// Repeatedly read into the buffer and then write it out
// until the reader has been exhausted.
while ( (len=inputReader.Read (buffer, 0, buffer.Length)) > 0)
{
outputWriter.Write (buffer, 0, len);
}
}
}
}
// Not much in the way of error handling here - you may well want
// to do better handling yourself!
catch (IOException e)
{
Console.WriteLine ("Exception during processing: {0}", e.Message);
}
}
}
Limitations
Due to the lack of available information about the DBCS aspect of EBCDIC, this encoding class makes no effort whatsoever to simulate proper shifting. Shift out and shift in are merely encoded/decoded to/from their equivalent Unicode characters, and bytes between them are treated as if the shift had not taken place. (This means that a decoded byte array is always a string of the same length as the byte array, and vice versa).
Any byte not recognised to be from the specific encoding being used is decoded to the question mark character, '?'. Any character not recognised to be in the set of characters encoded by the specific encoding being used is encoded to the byte representing the question mark character, or to byte zero if the question mark character is not in the character set either.
The library doesn't currently have a strong-name, so can't be placed in the GAC. You may, however, download the source and modify
Licence
This was just an interesting half-day project. I have no desire to make any money out of this code whatsoever, but I hope it's interesting and useful to others. So, feel free to use it. If you have any questions about it, or just find it useful and wish to let me know, please mail me at [email protected]. You may use this code in commercial projects, either in binary or source form. You may change the namespace and the class names to suit your company, and modify the code if you wish. I'd rather you didn't try to pass it off as your own work, and specifically you may not sell just this code - at least not without asking me first. I make no claims whatsoever about this code - it comes with no warranty, not even the implied warranty of fitness for purpose, so don't sue me if it breaks something. (Mail me instead, so we can try to stop it from happening again.)
Downloads
History
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
다양한 언어의 JSONJSON은 Javascript 표기법을 사용하여 데이터 구조를 레이아웃하는 데이터 형식입니다. 그러나 Javascript가 코드에서 이러한 구조를 나타낼 수 있는 유일한 언어는 아닙니다. 저는 일반적으로 '객체'{}...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.