Sunday, January 6, 2019
Data Compression and Decompression Algorithms
get across of Contents Introduction.. . 2 1. entropy capsule.. 2 1. 1Classification of compression 2 1. 2 instruction Compression methods.. 3 2. lossless Compression algorithmic programic ruleic programic rule.. . 4 2. 1 Run-Length Encoding.. . 4 2. 1. 1 algorithmic rule.. . 5 2. 1. 2Complexity .. .. . 5 2. 1. 3 Advantages and disadvantage.. . 6 3.Huffmann coding.. . 6 3. 1 Huffmann encoding.. .. 6 3. 2 algorithmic rule.. . 7 4. Lempel-Ziv algorithm.. 7 4. 1 Lempel-Ziv78.. . 8 4. 2Encoding Algorithm.. 8 4. 3 Decoding Algorithm.. 12 5. Lempel-Ziv Welch. 14 5. 1 Encoding Algorithm.. 14 5. 2 Decoding Algorithm.. 6 References. 17 INTRODUCTION entropy coalescency is a common emergency for most of the computerized applications. There be build of cultivation abridgment algorithms, which be commit to urge several(predicate) selective information formats. compensate for a whizz selective information role there ar bout of different contraction algorithms, which u se different approaches. This composing examines lossless data conjureion algorithms. 1. selective information COMPRESSION In computer knowledgedata conglutinationinvolvesencodinginformationusing fewerbitsthan the current representation.Compression is multipurpose because it helps reduce the consumption of re cums such as data space or contagious diseasecapacity. Because squiffy data must be de mo nononous to be utilize, this extra processing imposes computational or other costs through de condensation. 1. 1 Classification of Compression a) inactive/non-adaptive compression. b) Dynamic/adaptive compressioin. a) Static/Non-adaptive Compression A placidmethod is angiotensin converting enzyme in which the represent from the scar of meats to the decorate of autograph terms is fixed before transmission begins, so that a apt(p) message is represented by the same codeword every clipping it appears in the message ensemble.The classic static defined-word scheme is Huffman coding. b) Dynamic/adaptive compression A code isdynamicif the mapping from the set of messages to the set of codewords switch overs over time. 2. 2 Data Compression Methods 1) Losseless Compression lossless compression reduces bits by identifying and eliminatingstatistical redundancy. No information is lost in lossless compression is workable because most real-world data has statistical redundancy. For theoretical account, an image whitethorn keep up beas of colour that do non change over several pixels instead of coding red pixel, red pixel, the data may be encoded as 279 red pixels. Lossless compression is used in cases where it is main(prenominal) that the original and the de stringent data be identical, or where deviations from the original data could be deleterious. exemplary examples are executable programs, text documents, and source code. Some image load formats, similarPNGorGIF, use only(prenominal) lossless compression 2) Loosy Compression In information tec hnology, lossy compression is a data encoding method that compresses data by discarding (losing) some of it. The procedure aims to inimize the amount of data that needs to be held, handled, and/or hereditary by a computer. Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especi ally in applications such as stream media and internet telephony. If we take a vulnerability of a sunset over the sea, for example there are going to be groups of pixels with the same colour rank, which set up be trim. Lossy algorithms tend to be much complex, as a result they progress to better results for bitmaps and terminate accommodate for the ache of data. The compressed bill is an estimation of the original data.One of the disadvantages of lossy compression is that if the compressed file keeps being compressed, wherefore the quality leave degraded drastically. 2. Lossless Compression Algorithms 2. 1 Run-Length Encoding(RLE) RLE stands fo r Run Length Encoding. It is a lossless algorithm that only offers decent compression ratios in specific types of data. How RLE works RLE is in all probability the easiest compression algorithm. It replaces sequences of the same data value within a file by a count number and a single value. Suppose the followers get of data (17 bytes) has to be compressed ABBBBBBBBBCDEEEEFUsing RLE compression, the compressed file takes up 10 bytes and could control like this A 8B C D 4E F 2. 1. 1 Algorithm for (i=0i<lengthi++) J<-0 Counti=1 do J++ If (stri+j==stri) Counti++ while(stri+j==stri) If counti==1 Cout<<stri++ ElseCout<< counti<<stri Also you can see, RLE encoding is only effective if there are sequences of 4 or much repeating fibers because triple component parts are used to conduct RLE so coding devil repeating reference works would in time lead to an increase in file size.It is central to know that there are many different run-length encoding schemes. The higher up example has retributory been used to plant the basic teaching of RLE encoding. Sometimes the capital punishment of RLE is adapted to the type of data that are being compressed. 2. 1. 2 Complexity and Data Compression Were used to slop about complexity of an algorithm measurement time and we usually try to escort the fastest weaponation, like in see algorithms. Here it is not so important to compress data quickly, entirely to compress as much as possible so the payoff is as low-down as possible without lossing data.A great quality of run-length encoding is that this algorithm is easy to implement. 2. 1. 3 Advantages and disadvantages This algorithm is very easy to implement and does not require much mainframe horsepower. RLE compression is only efficient with files that give up lots of repetitive data. These can be text files if they contain lots of spaces for indenting but line-art images that contain large white or black areas are far more suitable. C omputer generated colour images (e. g. architectural drawings) can also give fair compression ratios. Where is RLE compression used? RLE compression can be used in the following file formats PDF files 3. HUFFMANN CODING Huffman coding is a popular method for compressing data with variable-length codes. Given a set of data emblems (an alphabet) and their frequencies of occurrence (or, equivalently, their probabilities), the method constructs a set of variable-length codewords with the shortest average length and assigns them to the symbols. Huffman coding serves as the basis for several applications implemented on popular platforms. Some programs use just the Huffman method, while others use it as nonpareil step in a multistep compression process. 3. 1 Huffman EncodingThe Huffman encoding algorithm starts by constructing a list of all the alphabet symbols in descending coiffe of their probabilities. It then constructs, from the bottom up, a binary star tree with a symbol at ever y leaf. This is d adept in steps, where at each step two symbols with the smallest probabilities are selected, added to the top of the partial tree, deleted from the list, and replaced with an auxiliary symbol representing the two original symbols. When the list is reduced to just one auxiliary symbol (representing the entire alphabet), the tree is complete. The tree is then traversed to determine the codewords of the symbols. . 2 Algorithm Huffmann(A) n=A Q=A For(i=1 to n-1) z=new node Leftz=Extract_min(Q) Rightz=Extract_min(Q) fz=fleftz+frightz go in(Q,z) open Extract_min(Q) //return root 4. The Lempel-ziv Algorithms The Lempel Ziv Algorithm is an algorithm for lossless data compression. It is not a single algorithm, but a whole family of algorithms, stemming from the two algorithms proposed by Jacob Ziv and Abraham Lempel in their landmark document in 1977 and 1978. Lempel Ziv algorithms are widely used in compression utilities such as gzip, GIF image compression.Following are the variants of Lempel-ziv algos LZ77Variants LZR LZSS LZB LZH LZ78variants LZW LZC LZT LZMW 4. 1 Lempel-ziv78 The LZ78 is a mental lexicon-based compression algorithm. The codewords railroad siding signal by the algorithm consist of two elements an indication referring to the protracted matching dictionary entry and the starting non-matching symbol. In addition to outfit signalting the codeword for storage/transmission, the algorithm also adds the index and symbol check to the dictionary. When a symbol that not yet in the dictionary is encountered, the codeword has the index value 0 and it is added to the dictionary as well.With this method, the algorithm gradually builds up a dictionary. 4. 2 Algorithm lexicon empty affix empty lexiconIndex 1 while(characterStream is not empty) melanise char future(a) character in characterStream if( affix + flame exists in the dictionary) affix affix + sear else if(Prefix is empty) CodeWordForPrefix 0 else CodeWordFor Prefix vocabularyIndex for Prefix outturn (CodeWordForPrefix, Char) attachIn vocabulary( ( dictionaryIndex , Prefix + Char) ) mental lexiconIndex++ Prefix empty exemplar 1 LZ78 Compression encode (i. e. compress) the make ABBCBCABABCAABCAAB using the LZ78 algorithm. Compressed message The compressed message is (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B) Note The above is just a representation, the commas and parentheses are not transmitted. go 1. A is not in the vocabulary inscribe it 2. B is not in the vocabulary insert it 3. B is in the Dictionary. BC is not in the Dictionary insert it. 4. B is in the Dictionary. BC is in the Dictionary. BCA is not in the Dictionary insert it. 5. B is in the Dictionary. BA is not in the Dictionary insert it. 6. B is in the Dictionary. BC is in the Dictionary.BCA is in the Dictionary. BCAA is not in the Dictionary insert it. 7. B is in the Dictionary. BC is in the Dictionary. BCA is in the Dictionary. BCAA is in the Dictionary. BCAAB is no t in the Dictionary insert it. LZ78 Compression No of bits transmitted Uncompressed cosmic string ABBCBCABABCAABCAAB numerate of bits = Total number of characters * 8 = 18 * 8 = 144 bits Suppose the codewords are indexed starting from 1 Compressed string( codewords) (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) Codeword index 1 2 3 4 5 6 7Each code word consists of an whole number and a character The character is represented by 8 bits. The number of bits n required to represent the integer part of the codeword with index i is given by Codeword (0, A) (0, B) (2, C) (3, A) (2, A) (4, A) (6, B) index 1 2 3 4 5 6 7 Bits (1 + 8) + (1 + 8) + (2 + 8) + (2 + 8) + (3 + 8) + (3 + 8) + (3 + 8) = 71 bits The actual compressed message is 0A0B10C11A010A100A110B 4. 3 Decompression Algorithm Dictionary empty DictionaryIndex 1 hile(there are more (CodeWord, Char) pairs in codestream) CodeWord conterminous CodeWord in codestream char character interchangeable to CodeWord (codeWord = = 0) absorb empty else String string at index CodeWord in Dictionary payoff String + Char insertInDictionary( (DictionaryIndex , String + Char) ) DictionaryIndex++ type LZ78 Decompression Decompressed message The decompressed message is ABBCBCABABCAABCAAB 5. Lempel-ziv Welch This ameliorate version of the original LZ78 algorithm is by chance the most famous modification and is sometimes even mistakenly referred to as the Lempel Ziv algorithm.Published by Terry Welch in 1984it basically applies the LZSS principle of not explicitly transmitting the next nonmatching symbol to the LZ78 algorithm. The only remaining sidetrack of this improved algorithm are fixed-length references to the dictionary (indexes). If the message to be encoded consists of only one character, LZW outputs the code for this character otherwise it inserts two- or multi-character, overlapping,distinct embodiments of the message to be encoded in a Dictionary. Overlapping The last character of a pattern is the head start character of the next pattern. 5. 1 Algorithm initialize Dictionary with 256 single character strings and their corresponding ASCII codes Prefix first stimulant character CodeWord 256 while(not end of character stream) Char next arousal character if(Prefix + Char exists in the Dictionary) Prefix Prefix + Char else yield the code for Prefix insertInDictionary( (CodeWord , Prefix + Char) ) CodeWord++ Prefix Char Output the code for Prefix recitation Compression using LZW Encode the string BABAABAAA by the LZW encoding algorithm. 1. BA is not in the Dictionary insert BA, output the code for its prefix code(B) 2.AB is not in the Dictionary insert AB, output the code for its prefix code(A) 3. BA is in the Dictionary. BAA is not in Dictionary insert BAA, output the code for its prefix code(BA) 4. AB is in the Dictionary. ABA is not in the Dictionary insert ABA, output the code for its prefix code(AB) 5. AA is not in the Dictionary insert AA, output the code for its pre fix code(A) 6. AA is in the Dictionary and it is the last pattern output its code code(AA) Compressed message The compressed message is <66><65><256><257><65><260> LZW compute of bits transmittedExample Uncompressed String aaabbbbbbaabaaba lean of bits = Total number of characters * 8 = 16 * 8 = 128 bits Compressed string (codewords) <97><256><98><258><259><257><261> Number of bits = Total Number of codewords * 12 = 7 * 12 = 84 bits Note Each codeword is 12 bits because the tokenish Dictionary size is taken as 4096, and 212 = 4096 5. 2 Decoding algorithm Initialize Dictionary with 256 ASCII codes and corresponding single character strings as their translations PreviousCodeWord first introduce code Output string(PreviousCodeWord) Char character(first input code) CodeWord 256 while(not end of code stream) CurrentCodeWord next input code if(CurrentCodeWord exists in the Dictionary) String string(CurrentCodeWord) else String stri ng(PreviousCodeWord) + Char Output String Char first character of String insertInDictionary( (CodeWord , string(PreviousCodeWord) + Char ) ) PreviousCodeWord CurrentCodeWord CodeWord++ Summary of LZW decoding algorithm output string(first CodeWord) while(there are more CodeWords) if(CurrentCodeWord is in the Dictionary) output string(CurrentCodeWord) else utput PreviousOutput + PreviousOutput first character insert in the Dictionary PreviousOutput + CurrentOutput first character Example LZW Decompression Use LZW to decompress the output sequence <66> <65> <256> <257> <65> <260> 1. 66 is in Dictionary output string(66) i. e. B 2. 65 is in Dictionary output string(65) i. e. A, insert BA 3. 256 is in Dictionary output string(256) i. e. BA, insert AB 4. 257 is in Dictionary output string(257) i. e. AB, insert BAA 5. 65 is in Dictionary output string(65) i. e. A, insert ABA 6. 60 is not in Dictionary output preliminary output + previous output first charac ter AA, insert AA References * http//www. sqa. org. uk/e-learning/BitVect01CD/page_86. htm * http//www. gukewen. sdu. edu. cn/panrj/courses/mm08. pdf * http//www. cs. cmu. edu/guyb/realworld/compression. pdf * http//www. stoimen. com/blog/2012/01/09/computer-algorithms-data-compression-with-run-length-encoding/ * http//www. ics. uci. edu/dan/pubs/DC-Sec1. hypertext mark-up languageSec_1 * http//www. prepressure. com/library/compression_algorithms/flatedeflate * http//en. wikipedia. org/wiki/Data_compression
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment