page

Dec 8, 2021

[Seurat] install error : object ‘markvario’ is not exported by 'namespace:spatstat'

> library("Seurat")
Error: package or namespace load failed for ‘Seurat’:
 object ‘markvario’ is not exported by 'namespace:spatstat'
In addition: Warning message:
package ‘Seurat’ was built under R version 3.6.3

 

Solve : downgrade "spatstat"

install.packages('devtools')
remove.packages(grep("spatstat", installed.packages(), value = T))
.rs.restartR()
devtools::install_version("spatstat", version = "1.64-1")


for details

https://github.com/mojaveazure/seurat-disk/issues/56

The {spatstat} package made updates that moved some of their functions to other packages. This broke {Seurat}, which in turn, breaks {SeuratDisk}. We are working on a fix for {Seurat} to solve this; in the mean time, you should be able to downgrade your {spatstat} installation with

remotes::install_version("spatstat", version = "1.64-1")

Once {Seurat} is updated to support the latest version of {spatstat}, this issue should be resolved.

Nov 7, 2021

[awk] Calculate avg of a row using awk

 Calculate avg of a row using awk

Q.

input.txt

157361 155687 156158 156830
149610 151824 152353 152027
159195 158490 159030 159243
153222 154227 154578 154390
168761 170078 170044 170107
147166 146477 146735 147678
155745 152142 155141 154140
148860 150040 149223 148246
147239 149693 148144 147990
148045 147987 149466 149535
146945 146206 145681 145852
156559 155188 156274 154962
143169 143798 142753 144045
153814 153320 153732 156621

 

A. Field numbers in AWK start from 1 and not from 0. So, in your for loop you need to put i = 1

Doing:

awk '{sum = 0; for (i = 1; i <= NF; i++) sum += $i; sum /= NF; print sum}' input.txt

 

[awk] Sum the values of a column, based on the values of another column

 Using awk to sum the values of a column, based on the values of another column

 

Q. I am trying to sum certain numbers in a column using awk. I would like to sum just column 3 of the "smiths" to get a total of 212. I can sum the whole column using awk but not just the "smiths". I have:

awk 'BEGIN {FS = "|"} ; {sum+=$3} END {print sum}' filename.txt

Also I am using putty. Thank you for any help.

smiths|Login|2
olivert|Login|10
denniss|Payroll|100
smiths|Time|200
smiths|Logout|10

A.
awk -F '|' '$1 ~ /smiths/ {sum += $3} END {print sum}' inputfilename
  • The -F flag sets the field separator; I put it in single quotes because it is a special shell character.
  • Then $1 ~ /smiths/ applies the following {code block} only to lines where the first field matches the regex /smiths/.
  • The rest is the same as your code.

Note that since you're not really using a regex here, just a specific value, you could just as easily use:

awk -F '|' '$1 == "smiths" {sum += $3} END {print sum}' inputfilename

Which checks string equality. This is equivalent to using the regex /^smiths$/, as mentioned in another answer, which includes the ^ anchor to only match the start of the string (the start of field 1) and the $ anchor to only match the end of the string. Not sure how familiar you are with regexes. They are very powerful, but for this case you could use a string equality check just as easily.

 

NumPy Tutoria

NumPy Tutorial:

 https://www.i2tutorials.com/numpy-tutorial/

 

Numpy is one of the libraries available for Python programming language. This library or module provides numerical and mathematical functions which are pre-compiled.

Numpy is designed to used for multidimensional arrays and for scientific computing which are memory efficient.

Here we have 2 packages

1. Numpy – This provides basic calculations with multi-dimensional arrays and matrices of numeric data.

2. Scipy – This package provides functionality of Numpy with added algorithms like , regression, minimization, Fourier transforms, statistical operations, random simulation and applied mathematical techniques.

 

Advantage of saving `.npz` files instead of `.npy`

What is the advantage of saving `.npz` files instead of `.npy` in python, regarding speed, memory and look-up?

the .npy format is:

the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. ... The format is designed to be as simple as possible while achieving its limited goals. (sources)

And .npz is only a

simple way to combine multiple arrays into a single file, one can use ZipFile to contain multiple “.npy” files. We recommend using the file extension “.npz” for these archives. (sources)

 

  • If only use np.save, there is no more compression on top of the .npy format, only just a single archive file for the convenience of managing multiple related files.
  • If use np.savez_compressed, then of course less memory on disk because of more CPU time to do the compression job (i.e. a bit slower).

 

Oct 21, 2021

Get-FileHash - md5sum in Windows

 https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/get-filehash?view=powershell-7.1&viewFallbackFrom=powershell-6

Get-FileHash - compute hash value of a file

In linux, you can check file hash value with md5sum 

ex: md5sum file

 

In Windows, you can use 'Get-FileHash'

note : you need Microsoft PowerShell to run Get-FileHash. You can launch PowerShell terminal by using VS Code

After launch PowerShell terminal in VS Code,

C:\Documents and Settings> Get-FileHash -Algorithm MD5 test.zip   


Algorithm       Hash                                                        Path
---------       ----                                                             ----
MD5             6CDDB6DBD3DC341AF7FCD56CA8D2B4B8     C:\Documents and Settings\test.zip

Oct 5, 2021

Embeding fonts into PDF file

Somtimes PDF files show scramble fonts. To prevent it, need to embed fonts into PDF files.

how to embed fonts using the Adobe XI Pro (not in Standard) ?

 https://answers.acrobatusers.com/How-I-embed-fonts-Adobe-XI-please-q149712.aspx

 -> Preflight  (Shift+Ctrl+X)

View -> Tools -> Print Production->Preflight -> PDF fixups ->Embed fonts

Jul 23, 2021

xargs command in linux ; xargs vs exec

 Linux and Unix xargs command tutorial with examples

https://shapeshed.com/unix-xargs/

 

xargs
 - command line for building an execution pipeline from standard input.

-  reads items from standard input as separated by blanks and executes a command once for each argument. 

echo 'one two three' | xargs mkdir
ls
one two three
 

xargs vs exec 

The find command supports the -exec option that allows arbitrary commands to be performed on found files. The following are equivalent.

find ./foo -type f -name "*.txt" -exec rm {} \; 

find ./foo -type f -name "*.txt" | xargs rm

So which one is faster? Let’s compare a folder with 1000 files in it.

time find . -type f -name "*.txt" -exec rm {} \; 0.35s user 0.11s system 99% cpu 0.467 total

 time find ./foo -type f -name "*.txt" | xargs rm 0.00s user 0.01s system 75% cpu 0.016 total

Clearly using xargs is far more efficient. In fact several benchmarks suggest using xargs over exec {} is six times more efficient.

 

SuperExactTest :R software package for multi-set intersection test & visualization

Efficient Test and Visualization of Multi-Set Intersections

Scientific Reports volume 5, Article number: 16923 (2015)

https://www.nature.com/articles/srep16923