I have been using Tree Pad http://www.treepad.com/
I have merged many tree pad files and need to remove duplicate articles.
I have a way of grouping articles and counting identicle lines.
# # Lines.awk - used to analysye K1205 rec2Ascii ISUP decode and prepend lines with a sortable string. # # usage mawk -f linesPW.awk info.hjt # usage mawk -f linesPW.awk info.hjt | sort | mawk -f unique.awk # # Description: try to analyse ISUP and collate messages to find a profile of usage. # The BIG problem is that the optional parameters are in any order. # It could be thought of as a Tree with leaves of variable size. # The branches are variable as well. # # # messages have mandatory, variable mandatory, optional parameters. # # parameters are multi-line. # # The objective is to try to totalize unique usages of lines of a parameter. # # find blocks of text, reset line number and prepend with block name and line number # # sort these lines and then use unique to count duplicate lines. # # BEGIN { cm = "" } # # define some rules tofind start of blocks of text. Set / reset want # ##============================================================ # # Turn off want events # ##============================================================ # # Turn on want events # ## find parameter name which has four spaces preceeding. #<Treepad version 3.0> #dt=Text #<node> #Personal Notes-pruned #0 #<end node> 5P9i0s8y19Z #dt=Text #<node> #ADSL tests #2 /<Treepad version 3.0>|<end node> 5P9i0s8y19Z/{ msg = $0 want = 0 getline getline getline msg = $0 getline depth = $0 cm = ":" msg ":" depth ":" ln = 10000 want = 1 } ##============================================================ # # While ( WANT ) print of line with a prefix with a line number. # ( want ) { print cm ln " :" $0 ln = ln + 1 }
The AWK script below finds duplicate adjacent lines and counts them.
# # # unique.awk # sort op.txt | mawk -f unique.awk >> op_unique.txt # # sort op.txt | mawk -f unique.awk >> op_unique.txt # # BEGIN { lastline = "" linecnt = 1 } ( $0 !=lastline ){ # print count of unique usage print linecnt "\t " lastline lastline = $0 linecnt = 1 next } { linecnt = linecnt + 1 }