Bytehist

A tool for generating byte-usage-histograms for all types of files with a special focus on binary executables in PE-format (Windows).


Author

Christian Wojner

Language

English

License

ISCL

Releases

Changes

1.0 (Build 102) Revised commandline interface     x
1.0 (Build 101) -     x
1.0 beta 1 -     x

Features

  • Makes byte-usage-histograms of any file of any size
  • Histograms are generated as sorted and unsorted diagrams
  • Sub-histograms for each section of binary executables (PE)
  • Quick overview with GUI navigation in case of sub-histograms
  • Percentage for the share in the total filesize for sub-histograms
  • Sourcerelated names for sub-histograms (= section-names in case of PEs)
  • Results can be saved as .jpg, .bmp and .png files
  • Works as GUI and also as commandline tool (for scripting purposes)

Syntax

bytehist.exe [-n] [-s _savefile_] [_inputfile_]

Parameters (mandatory):

Parameters (optional):
  -h, --help
          Shows this help.

  -n, --nogui
          Don't bring up any GUI

  -s, --save _savefile_
          Save histogram to given file (i.e.: test.bmp, test.png or test.jpg)

  _inputfile_
          File to analyze

Note: Executing bytehist without any parameters activates full GUI-mode.

Old syntax (Deprecated! ... however, it's still here to not unnecessarily force people to change their scripts)

bytehist [options file]

 

Executing bytehist without any parameters activates full GUI-mode.

 

options: -nogui ... don't bring up any GUI  
  -save file ... save histogram to given file (bmp, png or jpg)
  -h ... show a short help

Description

Statistics can be a very good method if you want to detect encrypted or packed data. Data that has been manipulated in such a way usually comes up with a very even distribution of bytes being used. In contrast normal data typically has some bytes that are used constantly, which is caused by any kind of structures. So the byte-distribution of unencrypted and unpacked clear text, database-files, ... and even executable binaries differ massevily from the encrypted and/or packed ones. By putting this "phenomenon" into a picture this difference can be easily visualized by histograms.

 

Examples:

 

The first example shows an unpacked file. In fact the source of this histogram was a log-file - so that's human readable information.
The second example roots in an usual ZIP-archive.
So as formerly said, to see the difference between them is an easy one.

 

Let's take a closer look at these examples. Both of them have a green and a red section. In the green section every pixel-column complies to it's positional matching bytecode and visualizes the number of occurrences in a vertical bar. In other words, a tall green bar on the most left side tells us that the byte-code 0h had lots of occurrences. And on the most right side you'll find byte-code FFh.
The red section has the same roots like the green section but this time we got all the possible byte-codes in a descending order regarding their occurrences. This makes it much easier to see the evenness.
Besides that two sections you'll also find the filename being shown on the top right corner and a percentage.

To get an understanding for what this percentage is trying to tell, let's take a look at what more bytehist can do for us. bytehist can split up histograms in sub-histograms. At the moment the most senseful situation of providing sub-histograms is when you have to deal with binary executables. Binary executables are usually internally split up in a number of sections. There are sections for containing data, code, and so on. It is a common approach that executables are being packed or/and even encrypted before they get publicly rolled out. Especially in the malware-sector encryption and packing is massively used as a kind of hurdle to hinder deep analysis through reversing (i.e.). So, in the case of a binary executable in PE format - that's the one Microsoft Windows uses - bytehist will come up with an overall-histogram as well as providing one histogram per section it found and even one for possible rest behind the last section. Regarding the percentage the overall-histogram will still say "100%" but all the others will tell the percentage of their specific share in the total filesize.

 

Examples:

 

Both of the examples have a scrollarea on the right side showing thumbs of the relating (sub-)histogram. By clicking them with the left mouse-button they can be zoomed. Once again we have firstly an unpacked and secondly a packed file, but this time, binary executables.

 

This feature gives a reverser the possibility to instantly find out the section that's containing (if so) packed/encrypted data.

 

Full examples ...

 

Packed data behind sections:

 

An UPX packed executable:

 

bytehist itself - unpacked: