page

Nov 20, 2013

sort & uniq to remove duplicated line

sort file_name
remove duplicated line : sort file_name | uniq
duplicated line only      : sort file_name | uniq -d
unique line only           : sort file_name | uniq -u
count duplicated line    : sort file_name | uniq -c


example

[sbm@NGS life101]$ more test.txt
aaa
bbb
bbb
bbb
ccc
ddd
[sbm@NGS life101]$ sort test.txt | uniq
aaa
bbb
ccc
ddd
[sbm@NGS life101]$ sort test.txt | uniq -d
bbb
[sbm@NGS life101]$ sort test.txt | uniq -u
aaa
ccc
ddd
[sbm@NGS life101]$ sort test.txt | uniq -c
      1 aaa
      3 bbb
      1 ccc
      1 ddd

Linux version check

lsb_release -a


LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 5.7 (Final)
Release: 5.7
Codename: Final