page

Nov 7, 2021

[awk] Calculate avg of a row using awk

 Calculate avg of a row using awk

Q.

input.txt

157361 155687 156158 156830
149610 151824 152353 152027
159195 158490 159030 159243
153222 154227 154578 154390
168761 170078 170044 170107
147166 146477 146735 147678
155745 152142 155141 154140
148860 150040 149223 148246
147239 149693 148144 147990
148045 147987 149466 149535
146945 146206 145681 145852
156559 155188 156274 154962
143169 143798 142753 144045
153814 153320 153732 156621

 

A. Field numbers in AWK start from 1 and not from 0. So, in your for loop you need to put i = 1

Doing:

awk '{sum = 0; for (i = 1; i <= NF; i++) sum += $i; sum /= NF; print sum}' input.txt

 

[awk] Sum the values of a column, based on the values of another column

 Using awk to sum the values of a column, based on the values of another column

 

Q. I am trying to sum certain numbers in a column using awk. I would like to sum just column 3 of the "smiths" to get a total of 212. I can sum the whole column using awk but not just the "smiths". I have:

awk 'BEGIN {FS = "|"} ; {sum+=$3} END {print sum}' filename.txt

Also I am using putty. Thank you for any help.

smiths|Login|2
olivert|Login|10
denniss|Payroll|100
smiths|Time|200
smiths|Logout|10

A.
awk -F '|' '$1 ~ /smiths/ {sum += $3} END {print sum}' inputfilename
  • The -F flag sets the field separator; I put it in single quotes because it is a special shell character.
  • Then $1 ~ /smiths/ applies the following {code block} only to lines where the first field matches the regex /smiths/.
  • The rest is the same as your code.

Note that since you're not really using a regex here, just a specific value, you could just as easily use:

awk -F '|' '$1 == "smiths" {sum += $3} END {print sum}' inputfilename

Which checks string equality. This is equivalent to using the regex /^smiths$/, as mentioned in another answer, which includes the ^ anchor to only match the start of the string (the start of field 1) and the $ anchor to only match the end of the string. Not sure how familiar you are with regexes. They are very powerful, but for this case you could use a string equality check just as easily.

 

NumPy Tutoria

NumPy Tutorial:

 https://www.i2tutorials.com/numpy-tutorial/

 

Numpy is one of the libraries available for Python programming language. This library or module provides numerical and mathematical functions which are pre-compiled.

Numpy is designed to used for multidimensional arrays and for scientific computing which are memory efficient.

Here we have 2 packages

1. Numpy – This provides basic calculations with multi-dimensional arrays and matrices of numeric data.

2. Scipy – This package provides functionality of Numpy with added algorithms like , regression, minimization, Fourier transforms, statistical operations, random simulation and applied mathematical techniques.

 

Advantage of saving `.npz` files instead of `.npy`

What is the advantage of saving `.npz` files instead of `.npy` in python, regarding speed, memory and look-up?

the .npy format is:

the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. ... The format is designed to be as simple as possible while achieving its limited goals. (sources)

And .npz is only a

simple way to combine multiple arrays into a single file, one can use ZipFile to contain multiple “.npy” files. We recommend using the file extension “.npz” for these archives. (sources)

 

  • If only use np.save, there is no more compression on top of the .npy format, only just a single archive file for the convenience of managing multiple related files.
  • If use np.savez_compressed, then of course less memory on disk because of more CPU time to do the compression job (i.e. a bit slower).