2021年6月25日星期五

Delete BOM from UTF-8 BOM file

 find . -name "*.java" | xargs dos2unix -r (-r、-remove-bom)

msys2 install dos2unix from remote public repo: pacman -S dos2unix

install from local file: pacman  -U dos2unix-7.4.2-1-x86_64.pkg.tar.zst   

sed -i $'1s/^\uFEFF//' file.txt
path=/home/$(whoami)/java
find $path -name "*.java" | xargs sed -i $'1s/^\uFEFF//' # delete bom
find $path -name "*.java" | xargs sed -i 's/\r//g'       # \r\n -> \n

tail --bytes=+4 withBOM.txt > withoutBOM.txt

UTF-8 BOM の有無は file コマンドで確認できます。

$ file sample.txt
sample.txt: ASCII text # UTF8 without BOM
$ file sample.txt
sample.txt: UTF-8 Unicode (with BOM) text # UTF8 without BOM

UTF-8 BOM は nkf コマンドで追加や削除できます。

$ nkf --overwrite --oc=UTF-8 sample.txt # Add BOM
sample.txt: ASCII text
$ nkf --overwrite --oc=UTF-8-BOM sample.txt # Remove BOM
sample.txt: UTF-8 Unicode (with BOM) text

没有评论: