Original post is here: eklausmeier.goip.de
When processing input files I have to check whether those input files have a common record format. For this I therefore have to compute the line length of each record in the input file.
1. Perl solution. The below program reads the input file and shows a histogram of each line length with its according frequency.
1#!/bin/perl -W
2# Histogram of line length's
3
4use strict;
5
6my %H;
7
8while (<>) {
9 $H{length($_)} += 1;
10}
11
12for (sort {$a <=> $b} keys %H) {
13 printf("%5d\t%d\n",$_,$H{$_});
14}
2. Perl one-liner. Many times a simple Perl program can be converted into a Perl one-liner. See for example Introduction to Perl one-liners, written by Peteris Krumnis. Also see Useful One-Line Scripts for Perl.
1perl -ne '$H{length($_)} += 1; END { printf("%5d\t%d\n",$_,$H{$_}) for (sort {$a <=> $b} keys %H); }' <yourFile>
Example usage:
1printf "\n\na\n\ab\nabc\n" | perl -ne '$H{length($_)} += 1; END { printf("%5d\t%d\n",$_,$H{$_}) for (sort {$a <=> $b} keys %H); }'
gives
1 1 2
2 2 1
3 3 1
4 4 1
3. Awk solution. If Perl is not available, then hopefully Awk is installed. Below Awk program accomplishes pretty much the same.
1#!/bin/awk -f
2
3function max(a,b) {
4 return a>b ? a : b
5}
6
7 { m = max(length($0),m); x[length($0)] += 1 }
8
9END {
10 for (i=0; i<=m; ++i)
11 print i, x[i]
12}