I have been using Tree Pad http://www.treepad.com/
I have merged many tree pad files and need to remove duplicate articles.
I have a way of grouping articles and counting identicle lines.
#
# Lines.awk - used to analysye K1205 rec2Ascii ISUP decode and prepend lines with a sortable string.
#
# usage mawk -f linesPW.awk info.hjt
# usage mawk -f linesPW.awk info.hjt | sort | mawk -f unique.awk
#
# Description: try to analyse ISUP and collate messages to find a profile of usage.
# The BIG problem is that the optional parameters are in any order.
# It could be thought of as a Tree with leaves of variable size.
# The branches are variable as well.
#
#
# messages have mandatory, variable mandatory, optional parameters.
#
# parameters are multi-line.
#
# The objective is to try to totalize unique usages of lines of a parameter.
#
# find blocks of text, reset line number and prepend with block name and line number
#
# sort these lines and then use unique to count duplicate lines.
#
#
BEGIN {
cm = ""
}
#
# define some rules tofind start of blocks of text. Set / reset want
#
##============================================================
#
# Turn off want events
#
##============================================================
#
# Turn on want events
#
## find parameter name which has four spaces preceeding.
#<Treepad version 3.0>
#dt=Text
#<node>
#Personal Notes-pruned
#0
#<end node> 5P9i0s8y19Z
#dt=Text
#<node>
#ADSL tests
#2
/<Treepad version 3.0>|<end node> 5P9i0s8y19Z/{
msg = $0
want = 0
getline
getline
getline
msg = $0
getline
depth = $0
cm = ":" msg ":" depth ":"
ln = 10000
want = 1
}
##============================================================
#
# While ( WANT ) print of line with a prefix with a line number.
#
( want ) {
print cm ln " :" $0
ln = ln + 1
}
The AWK script below finds duplicate adjacent lines and counts them.
#
#
# unique.awk
# sort op.txt | mawk -f unique.awk >> op_unique.txt
#
# sort op.txt | mawk -f unique.awk >> op_unique.txt
#
#
BEGIN {
lastline = ""
linecnt = 1
}
( $0 !=lastline ){
# print count of unique usage
print linecnt "\t " lastline
lastline = $0
linecnt = 1
next
}
{
linecnt = linecnt + 1
}