EBCDIC to ASCII Converter - CodeProject

:

Introduction

The purpose of Ebcdic2Ascii converter is to provide simple API for converting mainframe dump files encoded in EBCIDIC into ASCII.

Background

Our daily operation heavily depends on multiple feed files received from a mainframe. Most of these files are EBCDIC encoded which contain packed numbers (COMP-3). The "old team" used Easytrieve software to write COBOL-like scripts to parse source files and convert them into something readable, but these scripts are extremely difficult to maintain and can only run on 32-bit machines. I've combined multiple sources from the net to create a single API for converting EBCIDIC encoded files into ASCII encoded CSV files.

One of the common problems with converting EBCIDIC encoded files is packed numbers (also called "Computational-3", "Packed Decimal", or "Packed") because they are not characters and there is no byte-to-byte conversion for these fields. Technically, packed numbers are binary fields that put two digits into each byte in order to half the storage requirement compared to a character. If your file does not contain packed numbers, you are lucky and you won't require this tutorial. Otherwise, you need to know beforehand the template of the file format: the length of the "line" and each "field", its position, length and format within that line.

Using the Code

I thought that a concept of a line template, containing one or more field templates, helps a programmer to visualize the structure of the file and makes it easier to extend functionality for more complex tasks.

The parser takes an instance of a LineTemplate and can process either the whole file at once or line by line, which allows you to swap different templates for files with complex data structure. If you parse the whole file, you can call CreateCsvFile() method to dump parsed file in CSV format. Alternatively, you can iterate through an array of parsed lines. Each ParsedLine object contains a dictionary of ParsedField. All of the parsing logic is concentrated in ParsedField class.

The library contains a handful of classes listed below. First of all, you will need to create a LineTemplate for the file you intend to parse. When you make an instance of a LineTemplate class, you'll have to provide the line length (how many bytes per line) and a name (anything that makes sense to you).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
 
namespace Ebcdic2Ascii
{
    public class LineTemplate
    {
        public Dictionary<string, FieldTemplate> FieldTemplateDictionary { get; private set; }  
        public string LineTemplateName { get; private set; } //Just a name to distinguish your templates
        public int LineSize { get; private set; } //bytes per "line"
        public int FieldsCount
        {
            get
            {
                return this.FieldTemplateDictionary.Count;
            }
        }

        //Constructor 
        public LineTemplate(int lineSize, string templateName)
        {
            if (lineSize <= 0)
            {
                throw new ArgumentOutOfRangeException("line length must be greater than zero");
            }
            this.LineTemplateName = templateName;
            this.LineSize = lineSize;
        }

        public void AddFieldTemplate(FieldTemplate fieldTemplate)
        {
            if ((fieldTemplate.StartPosition + fieldTemplate.FieldSize) > this.LineSize)
            {
                throw new Exception(String.Format("Field \"{0}\" exceeds line boundary", fieldTemplate.FieldName));
            }

            this.FieldTemplateDictionary.Add(fieldTemplate.FieldName, fieldTemplate);
        }
       
        public string GetFieldNamesCSV(bool addQuotes)
        {
            StringBuilder sb = new StringBuilder();
            int count = 0;

            foreach (FieldTemplate fieldTemplate in this.FieldTemplateDictionary.Values)
            {
                sb.Append(addQuotes ? "\"" : "");
                sb.Append(fieldTemplate.FieldName);
                sb.Append(addQuotes ? "\"" : "");
                sb.Append(this.FieldTemplateDictionary.Count < count ? "," : "");
                count++;
            }
            return sb.ToString();
        }

        public XElement GetLineTemplateXml(string templateName)
        {
            XElement lineXml = new XElement("line",
                new XAttribute("templateName", this.LineTemplateName),
                new XAttribute("lineSize", this.LineSize),
                new XAttribute("fieldCount", this.FieldsCount)
                );

            foreach (FieldTemplate field in FieldTemplateDictionary.Values)
            {
                XElement fieldXml = field.GetFieldTemplateXml();
                lineXml.Add(fieldXml);
            }

            return lineXml;
        }

        public string GetLineTemplateXmlString()
        {
            XElement xml = this.GetLineTemplateXml(this.LineTemplateName);
            return xml.ToString();
        }
    }
} 

In addition, line template contains a dictionary of type FieldTemplate. Line template must contain at least one field template for the parser to work. Each field must have a unique name, but their positions may overlap if required.

FieldType enumerator provides most common data types for conversion.

  • AlphaNum - Regular alpha-numerical text encoded in EBCIDIC
  • Packed - Packed number (COMP-3 / packed decimal)
  • Binary - Binary Int16 or Int32
  • Numeric - EBCIDIC encoded signed or unsigned number
  • Date - EBCIDIC encoded number representing a date in yyMMdd format
  • PackedDate - Packed number representing a date in yyMMdd format
  • SourceBytesInHex - Source bytes in hex (raw data)
  • SourceBytesInDec - Source bytes in decimal (raw data)

FieldTemplate class has two constructors. Overloaded constructor takes an extra parameter: int decimalPlaces, used by packed and numerical field types. By default, decimalPlaces equals zero.

using System;
using System.Xml.Linq;

namespace Ebcdic2Ascii
{
    public enum FieldType { AlphaNum, Packed, Binary, Numeric, Date, 
				PackedDate, SourceBytesInHex, SourceBytesInDec }

    public class FieldTemplate
    {
        public string FieldName { get; private set; }
        public FieldType Type { get; private set; }
        public int StartPosition { get; private set; }
        public int FieldSize { get; private set; }
        public int DecimalPlaces { get; private set; }   //Scale -- for numerical data only

        //Constructor 1 
        public FieldTemplate(string fieldName, FieldType fieldType, 
			int startPosition, int fieldSize, int decimalPlaces)
        {
            this.ValidateInputParameters(fieldName, fieldType, startPosition, fieldSize, decimalPlaces);
            this.FieldName = fieldName.Trim();
            this.Type = fieldType;
            this.StartPosition = startPosition;
            this.FieldSize = fieldSize;
            this.DecimalPlaces = decimalPlaces;
        }

        //Constructor 2: Default decimalPlaces = 0
        public FieldTemplate(string fieldName, FieldType fieldType, int startPosition, int fieldSize)
            : this(fieldName, fieldType, startPosition, fieldSize, 0)
        {
            //decimalPlaces = 0
        }

        private void ValidateInputParameters(string fieldName, 
          FieldType fieldType, int startPosition, int fieldSize, int decimalPlaces)
        {
            if (fieldName == null || fieldName.Trim().Length == 0)
            {
                throw new ArgumentNullException("Field name is required for a template");
            }
            if (startPosition < 0)
            {
                throw new ArgumentOutOfRangeException(String.Format(
                  "Start position cannot be negative for a field template \"{0}\"", fieldName));
            }
            if (fieldSize <= 0)
            {
                throw new ArgumentOutOfRangeException(String.Format(
                  "Filed size must be greater than zero for a field template \"{0}\"", fieldName));
            }
            if (fieldType == FieldType.Binary)
            {
                if (fieldSize != 2 && fieldSize != 4)
                {
                    throw new Exception(String.Format(
                      "Incorrect number of bytes provided for a binary field template \"{0}\": {1}", 
                      fieldName, fieldSize));
                }
            }
            if (decimalPlaces < 0)
            {
                throw new ArgumentOutOfRangeException(String.Format(
                  "Number of decimal places cannot be negative for a field template \"{0}\"", fieldName));
            }
            if (decimalPlaces > 6)
            {
                throw new ArgumentOutOfRangeException(String.Format(
                  "Number of decimal places exceeds limit for a field template \"{0}\"", fieldName));
            }
        }

        public XElement GetFieldTemplateXml()
        {
            XElement fieldXml = new XElement("field", new XAttribute("name", this.FieldName));
            fieldXml.Add(new XElement("type", this.Type, new XAttribute("code", (int)Type)));
            fieldXml.Add(new XElement("position", this.StartPosition));
            fieldXml.Add(new XElement("length", this.FieldSize));
            fieldXml.Add(new XElement("decimalPlaces", this.DecimalPlaces));

            return fieldXml;
        }
    }
} 

ParserUtilites static class provides reusable functionality.

using System;
using System.IO;

namespace Ebcdic2Ascii
{
    public static class ParserUtilities
    {
        public static void PrintError(string errMsg)
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine(errMsg);
            Console.ForegroundColor = ConsoleColor.Gray;
        }

        public static void ConvertLineArrayToCsv(ParsedLine[] lines, 
          string outputFilePath, bool includeColumnNames, bool addQuotes)
        {
            Console.WriteLine("{0}: Writing output file...", DateTime.Now);

            if (Uri.IsWellFormedUriString(outputFilePath, UriKind.RelativeOrAbsolute))
            {
                throw new Exception("Output file path is not valid");
            }
            if (lines == null || lines.Length == 0)
            {
                PrintError("Nothing to write");
                return;
            }

            try
            {
                if (File.Exists(outputFilePath))
                {
                    File.Delete(outputFilePath);
                }

                using (TextWriter tw = new StreamWriter(outputFilePath, true))
                {
                    if (includeColumnNames && lines.Length > 0)
                    {
                        tw.WriteLine(lines[0].Line_Template.GetFieldNamesCSV(addQuotes));
                    }
                    foreach (ParsedLine line in lines)
                    {
                        tw.WriteLine(line.GetParsedFieldValuesCSV(addQuotes));
                    }
                }
                Console.WriteLine("{1}: Output file created {0}", 
                  Path.GetFileName(outputFilePath), DateTime.Now);
            }
            catch (Exception ex)
            {
                PrintError(ex.Message);
            }
        }

        public static string ConvertBytesToDec(byte[] bytes)
        {
            string result = "";
            foreach (byte b in bytes)
            {
                result += (int)b + " ";
            }
            return result.Trim();
        }

        public static byte[] ConvertHexStringToBytes(string hexStr)
        {
            if ((hexStr.Length + 1) % 3 != 0)
            {
                throw new Exception("Invalid hex string");
            }

            String[] strArray = hexStr.Split('-');
            byte[] byteArray = new byte[strArray.Length];

            for (int i = 0; i < strArray.Length; i++)
            {
                byteArray[i] = Convert.ToByte(strArray[i], 16);
            }

            return byteArray;
        }

        public static string RemoveNonAsciiChars(string text)
        {
            char[] chars = text.ToCharArray();

            for (int i = 0; i < chars.Length; i++)
            {
                if ((int)chars[i] < 32 || (int)chars[i] > 126)
                {
                    chars[i] = ' '; //Convert non-printable char to "space"
                }
                else if (chars[i] == '"' || chars[i] == '^') //Remove quote char
                {
                    chars[i] = ' ';
                }
            }

            return new String(chars).Trim();
        }

        public static byte[] ReadBytesRange(byte[] sourceBytes, 
          int startPosition, int length, bool throwExceptionIfSourceArrayIsTooShort)
        {
            byte[] resultBytes;

            if (length <= 0)
            {
                throw new Exception("Invalid array length: " + length);
            }
            if (startPosition < 0)
            {
                throw new Exception("Invalid start position: " + length);
            }
            if (sourceBytes.Length < startPosition)
            {
                throw new Exception("Start position is outside of array bounds");
            }
            if (sourceBytes.Length - startPosition - length < 0)
            {
                if (throwExceptionIfSourceArrayIsTooShort)
                {
                    throw new Exception("End position is outside of array bounds");
                }
                else
                {
                    //Shorten the length of output array for remaining bytes
                    length = sourceBytes.Length - startPosition;
                }
            }

            resultBytes = new byte[length];
            Array.Copy(sourceBytes, startPosition, resultBytes, 0, length);
            return resultBytes;
        }

        public static byte[] ReadBytesRange(byte[] sourceBytes, int startPosition, int length)
        {
            bool throwExceptionIfSourceArrayIsTooShort = true;
            byte[] resultBytes = ReadBytesRange(sourceBytes, 
              startPosition, length, throwExceptionIfSourceArrayIsTooShort);
            return resultBytes;
        }
    }
} 

ParsedField class encapsulates the main functionality for converting EBCDIC bytes to ASCII. I need to give citation of the original author of the Unpack() method, but I could not find the original source. Please let me know if you find out where this code came from.

using System;
using System.Globalization;
using System.Text;
using System.Text.RegularExpressions;
 
namespace Ebcdic2Ascii
{
    public class ParsedField
    {
        public FieldTemplate Field_Template { get; private set; }
        public string Value { get; private set; }
        public byte[] OriginalBytes { get; private set; }
        public string OriginalBytesInHex
        {
            get
            {
                return BitConverter.ToString(this.OriginalBytes);  
            }
        }
        public string OriginalBytesInDec
        {
            get
            {
                return ParserUtilities.ConvertBytesToDec(this.OriginalBytes);
            }
        }
        public bool ParsedSuccessfully { get; private set; }

        //Constructor
        public ParsedField(byte[] lineBytes, FieldTemplate fieldTemplate)
        {
            this.ParsedSuccessfully = true;
            this.Field_Template = fieldTemplate;
            this.Value = ParseField(lineBytes, fieldTemplate);
        }

        private string ParseField(byte[] lineBytes, FieldTemplate template)
        {
            if (lineBytes == null || lineBytes.Length == 0)
            {
                ParserUtilities.PrintError("Line bytes is null or empty");
                this.ParsedSuccessfully = false;
                return null;
            }
            if (lineBytes.Length < (template.StartPosition + template.FieldSize))
            {
                this.ParsedSuccessfully = false;
                throw new Exception(String.Format(
                  "Field \"{0}\" length falls outside the line length", template.FieldName));
            }

            byte[] fieldBytes = new byte[template.FieldSize];
            Array.Copy(lineBytes, template.StartPosition, fieldBytes, 0, template.FieldSize);
            this.OriginalBytes = fieldBytes;

            if (this.Field_Template.Type == FieldType.AlphaNum)
            {
                return this.ConvertAlphaNumEbcdic(fieldBytes);
            }
            else if (this.Field_Template.Type == FieldType.Numeric)
            {
                return this.ConvertNumericEbcdic(fieldBytes, template.DecimalPlaces);
            }
            else if (this.Field_Template.Type == FieldType.Packed)
            {
                return this.Unpack(fieldBytes, template.DecimalPlaces);
            }
            else if (this.Field_Template.Type == FieldType.Binary)
            {
                return ConvertBinaryEbcdic(fieldBytes, template.DecimalPlaces);
            }
            else if (this.Field_Template.Type == FieldType.Date)
            {
                return ConvertDateStrEbcdic(fieldBytes);
            }
            else if (this.Field_Template.Type == FieldType.PackedDate)
            {
                return ConvertPackedDateStrEbcdic(fieldBytes);
            }
            else if (this.Field_Template.Type == FieldType.SourceBytesInHex)
            {
                return this.OriginalBytesInHex;
            }
            else if (this.Field_Template.Type == FieldType.SourceBytesInDec)
            {
                return this.OriginalBytesInDec;
            }
            else
            {
                this.ParsedSuccessfully = false;
                throw new Exception(String.Format(
                  "Unable to parse field \"{0}\". Unknown field type: {1}", 
                  template.FieldName, template.Type.ToString()));
            }
        }

        private string ConvertAlphaNumEbcdic(byte[] ebcdicBytes)
        {
            if (this.ByteArrayIsFullOf_0xFF(ebcdicBytes))
            {
                return "";
            }

            //Encoding asciiEnc = Encoding.ASCII;
            //Encoding ebcdicEnc = Encoding.GetEncoding("IBM037");
            //string result = Encoding.ASCII.GetString(Encoding.Convert(ebcdicEnc, asciiEnc, ebcdicBytes));

            //Thank you sx2008
            Encoding ebcdicEnc = Encoding.GetEncoding("IBM037");
            string result = ebcdicEnc.GetString(ebcdicBytes); // convert EBCDIC Bytes -> Unicode string
            return result;
        }
 
        private string ConvertNumericEbcdic(byte[] ebcdicBytes, int decimalPlaces)
        {
            string tempNumStr = this.ConvertAlphaNumEbcdic(ebcdicBytes).Trim();
 
            if (tempNumStr == null || tempNumStr.Length == 0)
            {
                return "";
            }
 
            if (Regex.IsMatch(tempNumStr, @"^\d+$")) //Unsigned integer
            {
                string result = this.AdjustDecimalValues(Int64.Parse(tempNumStr), decimalPlaces);
                return result;
            }
            else if (Regex.IsMatch(tempNumStr, @"^\d+[A-R{}]$")) //Signed integer
            {
                string lastChar = tempNumStr.Substring(tempNumStr.Length - 1);
 
                switch (lastChar)
                {
                    case "{":
                        tempNumStr = tempNumStr.Replace("{", "0");
                        break;
                    case "A":
                        tempNumStr = tempNumStr.Replace("A", "1");
                        break;
                    case "B":
                        tempNumStr = tempNumStr.Replace("B", "2");
                        break;
                    case "C":
                        tempNumStr = tempNumStr.Replace("C", "3");
                        break;
                    case "D":
                        tempNumStr = tempNumStr.Replace("D", "4");
                        break;
                    case "E":
                        tempNumStr = tempNumStr.Replace("E", "5");
                        break;
                    case "F":
                        tempNumStr = tempNumStr.Replace("F", "6");
                        break;
                    case "G":
                        tempNumStr = tempNumStr.Replace("G", "7");
                        break;
                    case "H":
                        tempNumStr = tempNumStr.Replace("H", "8");
                        break;
                    case "I":
                        tempNumStr = tempNumStr.Replace("I", "9");
                        break;
                    case "}":
                        tempNumStr = "-" + tempNumStr.Replace("}", "0");//Fixed
                        break;
                    case "J":
                        tempNumStr = "-" + tempNumStr.Replace("J", "1");
                        break;
                    case "K":
                        tempNumStr = "-" + tempNumStr.Replace("K", "2");
                        break;
                    case "L":
                        tempNumStr = "-" + tempNumStr.Replace("L", "3");
                        break;
                    case "M":
                        tempNumStr = "-" + tempNumStr.Replace("M", "4");
                        break;
                    case "N":
                        tempNumStr = "-" + tempNumStr.Replace("N", "5");
                        break;
                    case "O":
                        tempNumStr = "-" + tempNumStr.Replace("O", "6");
                        break;
                    case "P":
                        tempNumStr = "-" + tempNumStr.Replace("P", "7");
                        break;
                    case "Q":
                        tempNumStr = "-" + tempNumStr.Replace("Q", "8");
                        break;
                    case "R":
                        tempNumStr = "-" + tempNumStr.Replace("R", "9");
                        break;
                }
 
                string result = this.AdjustDecimalValues(Int64.Parse(tempNumStr), decimalPlaces);
                return result;
            }
            else
            {
                this.ParsedSuccessfully = false;
                return tempNumStr;
            }
        }
 
        private string ConvertBinaryEbcdic(byte[] ebcdicBytes, int decimalPlaces)
        {
            if (this.ByteArrayIsFullOf_0xFF(ebcdicBytes))
            {
                return "";
            }
 
            //BitConverter requires low order bytes goes first, followed by the higher order bytes. 
            //Bytes are stored in the file in the opposite order, thus need to reverse bytes
            Array.Reverse(ebcdicBytes);
            long tempNum;
 
            if (ebcdicBytes.Length == 2)
            {
                //If 2 bytes are provided -- assume it's a short
                tempNum = BitConverter.ToUInt16(ebcdicBytes, 0);
            }
            else if (ebcdicBytes.Length == 4)
            {
                //If 4 bytes are provided -- assume it's an int
                tempNum = BitConverter.ToInt32(ebcdicBytes, 0);
            }
            else
            {
                //Just in case
                throw new Exception(String.Format(
                  "Incorrect number of bytes provided for a binary field: {1}", decimalPlaces));
            }
 
            string result = this.AdjustDecimalValues(tempNum, decimalPlaces);
            return result;
        }
 
        private string AdjustDecimalValues(long numericValue, int decimalPlaces)
        {
            if (decimalPlaces == 0)
            {
                return numericValue.ToString();
            }
            double result = numericValue / Math.Pow(10, decimalPlaces);
            return result.ToString();
        } 

        private string ConvertDateStrEbcdic(byte[] ebcdicBytes)
        {
            string dateStr = this.ConvertAlphaNumEbcdic(ebcdicBytes).Trim();
            string result = this.ConvertDateStr(dateStr);
            return result;
        }
 
        private string ConvertPackedDateStrEbcdic(byte[] ebcdicBytes)
        {
            string dateStr = this.Unpack(ebcdicBytes, 0);
            string result = this.ConvertDateStr(dateStr);
            return result;
        }
 
        private string ConvertDateStr(string dateStr)
        {
            dateStr = dateStr.Trim();
 
            if (dateStr.Trim() == "" || dateStr == "0" || 
                  dateStr == "0000000" || dateStr == "9999999")
            {
                return "";
            }
            if (Regex.IsMatch(dateStr, @"^\d{3,5}$"))
            {
                dateStr = dateStr.PadLeft(6, '0');
            }
 
            Match match = Regex.Match(dateStr, @"^(?<Year>\d{3})(?<Month>\d{2})
            (?<Day>\d{2})$"); //E.g.: 0801232 = 1980-12-31; 1811231 = 2080-12-31

            if (match.Success)
            {
                int year = Int32.Parse(match.Groups["Year"].Value) + 1900; //013 => 1913, 113 => 2013...
                int month = Int32.Parse(match.Groups["Month"].Value);
                int day = Int32.Parse(match.Groups["Day"].Value);
 
                try
                {
                    DateTime tempDate = new DateTime(year, month, day);
                    return tempDate.ToString("yyyy-MM-dd");
                }
                catch { }
            }
 
            if (Regex.IsMatch(dateStr, @"^\d{6}$"))
            {
                DateTime tempDate;
                if (DateTime.TryParseExact(dateStr, "yyMMdd", 
                     CultureInfo.InvariantCulture, DateTimeStyles.None, out tempDate))
                {
                    return tempDate.ToString("yyyy-MM-dd");
                }
            }
 
            this.ParsedSuccessfully = false;
            return dateStr;
        } 

        private string Unpack(byte[] ebcdicBytes, int decimalPlaces)
        {
            if (ByteArrayIsFullOf_0xFF(ebcdicBytes))
            {
                return "";
            }
 
            long lo = 0;
            long mid = 0;
            long hi = 0;
            bool isNegative;
 
            // this nybble stores only the sign, not a digit.  
            // "C" hex is positive, "D" hex is negative, and "F" hex is unsigned. 
            switch (Nibble(ebcdicBytes, 0))
            {
                case 0x0D:
                    isNegative = true;
                    break;
                case 0x0F:
                case 0x0C:
                    isNegative = false;
                    break;
                default:
                    //throw new Exception("Bad sign nibble");
                    this.ParsedSuccessfully = false;
                    return this.ConvertAlphaNumEbcdic(ebcdicBytes);
            }
            long intermediate;
            long carry;
            long digit;
            for (int j = ebcdicBytes.Length * 2 - 1; j > 0; j--)
            {
                // multiply by 10
                intermediate = lo * 10;
                lo = intermediate & 0xffffffff;
                carry = intermediate >> 32;
                intermediate = mid * 10 + carry;
                mid = intermediate & 0xffffffff;
                carry = intermediate >> 32;
                intermediate = hi * 10 + carry;
                hi = intermediate & 0xffffffff;
                carry = intermediate >> 32;
                // By limiting input length to 14, we ensure overflow will never occur

                digit = Nibble(ebcdicBytes, j);
                if (digit > 9)
                {
                    //throw new Exception("Bad digit");
                    this.ParsedSuccessfully = false;
                    return this.ConvertAlphaNumEbcdic(ebcdicBytes);
                }
                intermediate = lo + digit;
                lo = intermediate & 0xffffffff;
                carry = intermediate >> 32;
                if (carry > 0)
                {
                    intermediate = mid + carry;
                    mid = intermediate & 0xffffffff;
                    carry = intermediate >> 32;
                    if (carry > 0)
                    {
                        intermediate = hi + carry;
                        hi = intermediate & 0xffffffff;
                        carry = intermediate >> 32;
                        // carry should never be non-zero. Back up with validation
                    }
                }
            }
 
            decimal result = new Decimal((int)lo, (int)mid, (int)hi, isNegative, (byte)decimalPlaces);
            return result.ToString();
        }
 
        private int Nibble(byte[] ebcdicBytes, int nibbleNo)
        {
            int b = ebcdicBytes[ebcdicBytes.Length - 1 - nibbleNo / 2];
            return (nibbleNo % 2 == 0) ? (b & 0x0000000F) : (b >> 4);
        }
 
        private bool ByteArrayIsFullOf_0xFF(byte[] ebcdicBytes)
        {
            if (ebcdicBytes == null || ebcdicBytes.Length == 0)
            {
                return false;
            }
            foreach (byte b in ebcdicBytes)
            {
                if (b != 0xFF)
                {
                    return false;
                }
            }
            return true;
        }
    }
} 

ParsedLine contains a dictionary or parsed fields as well as means of accessing them.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace Ebcdic2Ascii
{
    public class ParsedLine
    {
        public LineTemplate Line_Template { get; private set; }
        public Dictionary<string, ParsedField> FieldDictionary 
          { get; private set; } //= new Dictionary<string, ParsedField>();
        public string this[string fieldName]
        {
            get
            {
                return this.FieldDictionary[fieldName].Value.Trim();
            }
        } 

        //Constructor
        public ParsedLine(LineTemplate lineTemplate, byte[] lineBytes)
        {
            this.Line_Template = lineTemplate;
            this.ParseLine(lineBytes, lineTemplate);
        }
 
        private void ParseLine(byte[] lineBytes, LineTemplate lineTemplate)
        {
            this.ValidateInputParameters(lineBytes, lineTemplate);
 
            foreach (var fieldTemplate in lineTemplate.FieldTemplateDictionary)
            {
                FieldDictionary.Add(fieldTemplate.Key, 
                  new ParsedField(lineBytes, lineTemplate.FieldTemplateDictionary[fieldTemplate.Key]));
            }
        }
 
        private void ValidateInputParameters(byte[] lineBytes, LineTemplate template)
        {
            if (lineBytes == null) 
            {
                throw new ArgumentNullException("Line bytes required");
            }
            if (lineBytes.Length < template.LineSize)
            {
                throw new Exception(String.Format(
                  "Bytes provided: {0}, line size: {1}", lineBytes.Length, template.LineSize));
            }
            if (template == null)
            {
                throw new ArgumentNullException("line template is required");
            }
            if (template.FieldsCount == 0)
            {
                throw new Exception("Field templates have not been defined in the line template");
            }
        }
 
        public string GetParsedFieldValuesCSV(bool addQuotes)
        {
            StringBuilder sb = new StringBuilder();
            int count = 0;
 
            foreach (ParsedField parsedField in this.FieldDictionary.Values)
            {
                sb.Append(addQuotes ? "\"" : "");
                sb.Append(parsedField.Value);
                sb.Append(addQuotes ? "\"" : "");
                sb.Append(this.FieldDictionary.Count < count ? "," : "");
                count++;
            }
            return sb.ToString();
        }
    }
}

EbcdicParser is the manager class which takes care of applying template to a given EBCDIC file. It only has a few public methods and a single public property called Lines which gives you an array of type ParsedLine.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
 
namespace Ebcdic2Ascii
{
    public class EbcdicParser
    {
        public ParsedLine[] Lines { get; private set; }
 

        #region Constructors
 
        public EbcdicParser()
        {
            //Empty constructor
        }
 
        public EbcdicParser(byte[] allBytes, LineTemplate lineTemplate)
        {
            double expectedRows = (double)allBytes.Length / lineTemplate.LineSize;
            Console.WriteLine("{0}: Parsing started", DateTime.Now);
            Console.WriteLine("{1}: Line count est {0:#,###.00}", expectedRows, DateTime.Now);
 
            this.Lines = this.ParseAllLines(lineTemplate, allBytes);
 
            //Collect garbage
            GC.Collect();
            GC.WaitForPendingFinalizers();
            Console.WriteLine("{1}: {0} line(s) have been parsed", this.Lines.Count(), DateTime.Now);
        }
 
        public EbcdicParser(string sourceFilePath, LineTemplate lineTemplate)
            : this(File.ReadAllBytes(sourceFilePath), lineTemplate)
        {
            //Constructor with the file path
        } 
        #endregion
 
        public ParsedLine[] ParseAllLines(LineTemplate lineTemplate, byte[] allBytes)
        {
            bool isSingleLine = false;
            this.ValidateInputParameters(lineTemplate, allBytes, isSingleLine);
            
            List<ParsedLine> parsedLines = new List<ParsedLine>();
            byte[] lineBytes = new byte[lineTemplate.LineSize];
            ParsedLine parsedLine;
 
            for (int i = 0; i < allBytes.Length; i += lineTemplate.LineSize)
            {
                if (i % 1000 == 0)
                {
                    //Print progress
                    Console.Write(i + "\r");
                }
                Array.Copy(allBytes, i, lineBytes, 0, lineTemplate.LineSize);
                parsedLine = this.ParseSingleLine(lineTemplate, lineBytes);
                parsedLines.Add(parsedLine);
            }
            return parsedLines.ToArray();
        }
        public ParsedLine[] ParseAllLines(LineTemplate lineTemplate, string sourceFilePath)
        {
            return this.ParseAllLines(lineTemplate, File.ReadAllBytes(sourceFilePath));
        }
        public ParsedLine ParseSingleLine(LineTemplate lineTemplate, byte[] lineBytes)
        {
            bool isSingleLine = true;
            this.ValidateInputParameters(lineTemplate, lineBytes, isSingleLine);
            ParsedLine parsedLine = new ParsedLine(lineTemplate, lineBytes);
            return parsedLine;
        }
        private bool ValidateInputParameters(LineTemplate lineTemplate, byte[] allBytes, bool isSingleLine)
        {
            if (allBytes == null)
            {
                throw new ArgumentNullException("Ebcdic data is not provided");
            }
            if (lineTemplate == null)
            {
                throw new ArgumentNullException("Line template is not provided");
            }
            if (lineTemplate.FieldsCount == 0)
            {
                throw new Exception("Line template must contain at least one field");
            }
            if (allBytes.Length < lineTemplate.LineSize)
            {
                throw new Exception("Data length is shorter than the line size");
            }
            if (isSingleLine && allBytes.Length != lineTemplate.LineSize)
            {
                throw new Exception("Bytes count doesn't equal to line size");
            }
            double expectedRows = (double)allBytes.Length / lineTemplate.LineSize;
            if (expectedRows % 1 != 0) //Expected number of rows is not a whole number
            {
                throw new Exception("Expected number of rows is not a whole number. Check line template.");
            }
            return true;
        }

        public void CreateCsvFile(string outputFilePath, bool includeColumnNames, bool addQuotes)
        {
            if (this.Lines == null || this.Lines.Length == 0)
            {
                throw new Exception("No lines have been parsed"); 
            }
            ParserUtilities.ConvertLineArrayToCsv(this.Lines, outputFilePath, includeColumnNames, addQuotes);
        }
    }
}

Finally, to run a program, we create an instance of a LineTemplate class, add several FieldTemplate objects to it and pass it on to EbcdicParser. Then, we can access an array of ParsedLine via Lines property and read every field within that line.

class Program
{
    static void Main(string[] args)
    {
        LineTemplate lineTemplate = new LineTemplate(200, "Accounts_SourceFileTemplate");
        lineTemplate.AddFieldTemplate(new FieldTemplate("RecordType", FieldType.AlphaNum, 0, 2));
        lineTemplate.AddFieldTemplate(new FieldTemplate("CustomerNumber", FieldType.Numeric, 2, 4));
        lineTemplate.AddFieldTemplate(new FieldTemplate("FirstName", FieldType.AlphaNum, 6, 30));
        lineTemplate.AddFieldTemplate(new FieldTemplate("LastName", FieldType.AlphaNum, 36, 30));
        lineTemplate.AddFieldTemplate(new FieldTemplate("DateOfBirth", FieldType.PackedDate, 66, 2));
        lineTemplate.AddFieldTemplate(new FieldTemplate("BalanceOutstanding", FieldType.Packed, 68, 2));
        lineTemplate.AddFieldTemplate(new FieldTemplate("SomeStrangeData", FieldType.AlphaNum, 70, 35));
        lineTemplate.AddFieldTemplate(new FieldTemplate("FileRunDate", FieldType.Date, 105, 7));
        lineTemplate.AddFieldTemplate(new FieldTemplate("RentalDays", FieldType.Packed, 112, 2));
        lineTemplate.AddFieldTemplate(new FieldTemplate("TheWholeLine", 
          FieldType.AlphaNum, 0, 200)); //Try to parse the whole line as alpha-numerical

        EbcdicParser parser = new EbcdicParser(@"C:\temp\sourceFile.dat", lineTemplate);

        foreach (ParsedLine line in parser.Lines)
        {
            Console.WriteLine("{0} {1} {2}", line["CustomerNumber"], 
              line["FirstName"], line["DateOfBirth"]);
        }
    }
} 

I hope you'll find it useful.

Points of Interest

Parser is rather quick. It takes a few seconds to parse source files which may be hundreds of megabytes in size.

History

  • 2013-10-28: Uploaded source code, added more "background" and "using the code" information
  • 2015-03-12: Source code has been refactored. Added IBM935 custom decoder for simplified Chinese characters. Added ability to export template to XML and use XML files to initialize templates.