complete revamps

2022-12-24 03:20:02 +09:00
parent aca29317c6
commit 9a9c2343d1
66 changed files with 13682 additions and 4599 deletions
--- a/source/_posts/2021/affinity-thumbnail.md
+++ b/source/_posts/2021/affinity-thumbnail.md
@@ -3,45 +3,55 @@ title: Distill Thumbnail from .afphoto and .afdesign
 date: 2021-02-14T13:30:00
 ---

-Nextcloud does not have support for generating thumbnails from Affinity Photo and Affinity Design. Fine, I'll do it myself.
+[Nextcloud](https://en.wikipedia.org/wiki/Nextcloud) does not support generating thumbnails from [Affinity Photo](https://en.wikipedia.org/wiki/Affinity_Photo) and [Affinity Designer](https://en.wikipedia.org/wiki/Affinity_Designer). Fine, I'll do it myself!

 # Digging Binary

-Glancing at `.afphoto` and `.afdesign` in Finder, I noticed that it has a QuickLook support and an ability to show the thumbnail image. So these files should have thumbnail image somewhere inside its binary.
+Glancing at `.afphoto` and `.afdesign` in Finder, I noticed that it has a QuickLook support and an ability to show the thumbnail image. Meaning there's a chance that these files contain **pre-generated thumbnail images** somewhere inside its binaries, meaning I don't have to reverse-engineer their format from ground up.

-I wrote a piece of Node.js script to seek for [PNG signature](https://www.w3.org/TR/PNG/) inside a binary and save it as an image file.
+To verify this, I wrote a piece of Node.js script to seek for [PNG blob](https://www.w3.org/TR/PNG/) inside an .afphoto/.afdesign file and save it as a normal PNG file.

-```js afthumb.js
+In the [11.2.1 General](https://www.w3.org/TR/PNG/#11Chunks) of the PNG spec, they stated a valid PNG image should begin with a PNG signature and end with an `IEND` chunk.
+
+> A valid PNG datastream shall begin with a PNG signature, immediately followed by an `IHDR` chunk, then one or more `IDAT` chunks, and shall end with an `IEND` chunk. Only one `IHDR` chunk and one `IEND` chunk are allowed in a PNG datastream.
+
+Conveniently, it is also guaranteed that there should be **only one IEND chunk** in a PNG file, so greedy search would just work.
+
+```js gen_thumbnail.js
 const fs = require("fs");

 // png spec: https://www.w3.org/TR/PNG/
 const PNG_SIG = Buffer.from([137, 80, 78, 71, 13, 10, 26, 10]);
 const IEND_SIG = Buffer.from([73, 69, 78, 68]);

-function extractThumbnail(buf) {
+function extractPngBlob(buf) {
  const start = buf.indexOf(PNG_SIG);
  const end = buf.indexOf(IEND_SIG, start) + IEND_SIG.length * 2; // IEND + CRC
  return buf.subarray(start, end);
 }

-function generateThumbnail(input, output) {
+function extractThumbnail(input, output) {
  const buf = fs.readFileSync(input);
-  const thumbBuf = extractThumbnail(buf);
-  fs.writeFileSync(output, thumbBuf);
+  const pngBlob = extractPngBlob(buf);
+  fs.writeFileSync(output, pngBlob);
 }

-generateThumbnail(process.argv[2], process.argv[3] || "output.png");
+extractThumbnail(process.argv[2], process.argv[3] || "output.png");
 ```

-That's right. This script just scrapes a binary file and distill a portion of which starts with `PNG` signature and ends with `IEND`.
+That's right. This script just do `indexOf` on a `Buffer` and distill a portion of which starts with `PNG` signature and ends with `IEND` (+ CRC checksum).
+
+# CRC (Cyclic Redundancy Code)
+
+You may have wondered about `IEND_SIG.length * 2` part. It was to include [32-bit CRC](https://en.wikipedia.org/wiki/Cyclic_redundancy_check#CRC-32_algorithm) (Cyclic Redundancy Code) for `IEND` to the resulting blob.
+
+Here, the byte-length of `IEND` chunk and its `CRC` checksum are coincidentally the same (4 bytes), so I just went with that code.

 Now I can generate a thumbnail image from arbitrary `.afphoto` and `.afdesign` file. Let's move on delving into Nextcloud source code.

 # Tweaking Nextcloud

-I have a bit of experience in tweaking Nextcloud source code before, where I implemented thumbnail generator for PDFs, so it should be easier this time, hopefully.
-
-Long story short, I got Nextcloud generates thumbnail images for Affinity files by implementing `ProviderV2` class.
+At this point, all I have to do is to rewrite the above code in PHP and make them to behave as a Nextcloud Preview Provider.

 ```php lib/private/Preview/Affinity.php
 <?php
@@ -79,6 +89,8 @@ class Affinity extends ProviderV2 {
 }
 ```

+Also make sure my component to be auto-loaded on startup.
+
 ```patch lib/private/PreviewManager.php
@@ -363,6 +365,8 @@
 		$this->registerCoreProvider(Preview\Krita::class, '/application\/x-krita/');
@@ -114,7 +126,13 @@ class Affinity extends ProviderV2 {

 ![](afphoto.png)

-Easy-peasy!
+Voilà! Now I can see beautiful thumbnails for my drawings in Nextcloud web interface.
+
+This is exactly why I love FOSS. It allows me to materialize **any niche things** I want in the FOSS without bothering its developers. This fact not only gives me confidence that I can control the functionality of the software, but it also makes me have more trust in the developers for giving me such freedom to make changes to their software.
+
+# Finalized Solution
+
+Enough talking, I've pushed my Nextcloud Docker setup with the above patches included on [GitHub](https://github.com/uetchy/docker-nextcloud). You can see the actual patch [here](https://github.com/uetchy/docker-nextcloud/blob/master/patches/lib.patch). Note that it also contains the patches for PDF thumbnail generator described below, and this particular patch _may_ pose security implications because of the usage of Ghostscript against PDF.

 # Bonus: PDF thumbnail generator

--- a/source/_posts/2021/installing-arch-linux.md
+++ b/source/_posts/2021/installing-arch-linux.md
--- a/source/_posts/2021/oauth-jwt-rfcs.md
+++ b/source/_posts/2021/oauth-jwt-rfcs.md
@@ -44,11 +44,11 @@ OAuth 2.0 において、`access_token`をリソースサーバーに渡す手
 2. Form Encoded Parameters (SHOULD NOT)
 3. URI Query Parameters (SHOULD NOT)

-## [OICD](https://openid.net/specs/openid-connect-core-1_0.html) — OpenID Connect Core 1.0
+## [OIDC](https://openid.net/specs/openid-connect-core-1_0.html) — OpenID Connect Core 1.0

 2014 年 11 月

-OAuth 2.0 の上にいくつか仕様を足したサブセット。
+OAuth 2.0 の上にいくつか仕様を足したサブセットで、OAuth (Authorization)に Authentication の機能を付与した画期的なプロトコル。

 ## [RFC7515](https://tools.ietf.org/html/rfc7515) — JSON Web Signature (JWS)

--- a/source/_posts/2021/server-2020.md
+++ b/source/_posts/2021/server-2020.md
@@ -8,26 +8,39 @@ date: 2021-02-13T00:00:00
 # 用途

 - 各種セルフホスト (Docker)
- Docker Swarm / K8s のマスター
+- Docker Swarm クラスターのマスターノード
 - 計算実験
 - VS Code Remote SSH のホストマシン
- VPN 他

 # スペック

 重いタスクを並列してやらせたいので最優先は CPU とメモリ。メモリは[DDR4-3200 32GBx2](https://shop.tsukumo.co.jp/goods/4582353591719/) を、CPU は昨今のライブラリのマルチコア対応を勘案して [Ryzen 9 3950X](https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x) を選んだ。CPU クーラーは静音性を考えて Noctua の [NH-D15 Black](https://noctua.at/en/nh-d15) 。

-> 結果から言うとメモリは 64GB では足りなかった。巨大な Pandas データフレームを並列処理したり、DeepSpeed でモデルの重みをオフロードするたびに OOM が発動してしまう。最終的に 128GB まで増やす羽目になった。
-
 > 追記: メモリ異常を起因とするシステム誤動作により、`/sbin` 以下がゼロ上書きされカーネルが起動しなくなる災害が起きた。後日 ECC 付きのメモリに交換してからは、現在に至るまでメモリ関連の異常は発生していない。常時稼働するサーバーには最初から ECC メモリを選ぼう。

-> 追記: 結局 128GB でも OOM になる場面が出てきたが、もうこれ以上増設できない。最初から DIMM を 8 枚挿せるマザーボードを選ぶべきだった。
+> 追記 2: 結果から言うとメモリは 64GB では足りなかった。巨大な Pandas データフレームを弄ったり、10 億レコード以上ある MongoDB を走査したりするたびに OOM が発動してしまう。最終的に 128GB まで増やす羽目になった。

-GPU は古いサーバーに突っ込んでいた NVIDIA GeForce GTX TITAN X (Maxwell)を流用した。グラフィックメモリが 12GB ちょっとしかないが、最大ワークロード時でも 5GB は残るので今のところ十分。
+> 追記 3: 結局 128GB でも OOM になる場面が出てきたが、スロットが埋まっていてもうこれ以上増設できないし、できたとしても Ryzen シリーズが 128GB までしかサポートしてないのでどちらにしろ詰みである。
+>
+> これから機械学習/OLAP サーバーを構築しようと考えている読者は、最初から DIMM スロットが豊富な Dual-socket (CPU が 2 つ挿せる) マザーボードと、サーバー向け CPU (EPYC または Xeon)の組み合わせを検討すべきだ。サーバー向け CPU に関しては、EOL を迎えてデータセンターから放出された 5 年ほど前のモデルが eBay で安く手に入るはず。

-> 結果から言うと GPT-J や Megatron-LM を始めとした億パラメータ級のモデルを学習・推論させるには、DeepSpeed の助けがあったとしても最低 16GB の VRAM が必要だった。
+> 追記 4: eBay で良いパーツを探すコツ
+>
+> 1. 発送時期が異様に長くない (所有しているものを出品している)
+> 2. シリアル番号が写っている現物の写真がある (偽物や Engineering Sample の可能性が低い)
+> 3. 通電に加えて正常な動作確認を行っている (疑問点があれば出品者に質問を送ってしっかり言質をとっておくこと)

-記憶装置は WD HDD 3TB 2 台と Samsung 970 EVO Plus 500GB M.2 PCIe、そして古いサーバーから引っこ抜いた Samsung 870 EVO Plus 500GB SSD 。NVMe メモリは OS 用、SSD/HDD はデータとバックアップ用にする。
+GPU は古いサーバーに突っ込んでいた NVIDIA GeForce GTX TITAN X (Maxwell)を流用した。VRAM が 12GB ちょっとしかないが、最大ワークロード時でも 5GB は残るので今のところ十分。
+
+> 追記: 結果から言うと GPT-J や Megatron-LM を始めとした億パラメータ級のモデルを学習・推論させるには、DeepSpeed の助けがあったとしても最低 16GB の VRAM が必要だった。他の例を挙げると、Stable Diffusion の Fine-tuning には最低 30GB 前後の VRAM が必要になるし、OpenAI Whisper の large モデルを動かす際にも 13GB は見ておく必要がある。
+>
+> 今から GPU を買うなら、2022 年 10 月現在中古市場で 10 万前後で推移している RTX 3090 (24GB)を 2 枚買う戦略が筋が良いだろう。お金持ちなら A100 を買えば良い。
+
+> 追記 2: RTX 3090 を安く仕入れることができたが、サイズが大きすぎて Meshify 2 ケースの HDD ベイに干渉してしまった。いくつか HDD を移動させることで上手く挿入できたが、かわりに 3 つ分のベイが使用できなくなった。
+>
+> これからケースを買おうとしている読者は、最初から ATX ケースを通り越して Meshify 2 XL など、E-ATX/EE-ATX 対応ケースを選ぶことをおすすめする。どちらにせよそのようなケースでないと上記の Dual-socket マザーボードは挿せないし、十分な冷却環境を確保できない。
+
+記憶装置は WD HDD 3TB 2 台と Samsung 970 EVO Plus 500GB M.2 PCIe、そして古いサーバーから引っこ抜いた Samsung 870 EVO Plus 500GB SSD 。NVMe メモリは OS とキャッシュ用、SSD/HDD はデータ用にする。

 マザーボードは、X570 と比較して実装されているコンデンサーやパーツがサーバー向きだと感じた[ASRock B550 Taichi](https://www.asrock.com/mb/AMD/B550%20Taichi/) にした。

@@ -43,6 +56,7 @@ Arch Linux のセットアップは[個別に記事](https://uechi.io/blog/insta

 # パーツ選定時のポイント

+- [Disk Prices](https://diskprices.com/?locale=us) でディスクの値段チェック
 - [WikiChip](https://en.wikichip.org/wiki/WikiChip)で CPU のモデルやスペックを調査する
 - [PCPartPicker](https://jp.pcpartpicker.com/)でパーツのコスト計算をする
 - [Bottleneck Calculator](https://pc-builds.com/calculator/)で CPU と GPU の組み合わせを選び、そのうちどちらが性能のボトルネックになるか調べる
@@ -60,7 +74,7 @@ Arch Linux のセットアップは[個別に記事](https://uechi.io/blog/insta

 - 少なくとも 1 年間はすべての箱・書類を取っておく（特にメモリは箱自体が保証書代わりになっている場合がある）
 - 筐体は無視してまずマザボ、CPU、クーラー、（オンボードグラフィックが無い CPU なら）グラボ、そして電源を繋いで通電・動作テストをする
-  - [MemTest86](https://www.memtest86.com/)でメモリの動作テストを最後までやる（エラーが出たら交換依頼）
+  - [MemTest86](https://www.memtest86.com/)でメモリのテストを最後までやる（エラーが出たら交換依頼）
  - USB ブートで OS の起動確認
 - Ethernet が死んでいる場合は USB-Ethernet アダプターでまずネットを確保する
  - ほとんどの場合 Linux カーネルのバージョンを上げると（デバイスドライバーも新しくなり）直る
@@ -69,4 +83,4 @@ Arch Linux のセットアップは[個別に記事](https://uechi.io/blog/insta
  - 駄目ならマザボまたはアダプターメーカーからアップデートを探す
 - 安い筐体のネジは柔いことがあるため、強く押し込みながら少しずつ回す
  - 山が潰れてきたらゴムシートを挟む
- すべて動いたら、[Linux Hardware Database に Probe を送信](https://linux-hardware.org/index.php?view=howto)して貢献
+- すべて動いたら、[Linux Hardware Database に Probe を送信](https://linux-hardware.org/index.php?view=howto)して貢献する
--- a/source/_posts/2021/split-bill.md
+++ b/source/_posts/2021/split-bill.md
@@ -12,7 +12,7 @@ date: 2021-02-14T00:00:00

 1. 全員の出費を算出（払い過ぎは正、払わなさすぎは負の数）
 2. 降順でソート（出費過多が先頭）
-3. リストの最後（最大債務者, 出費=L）がリストの最初（最大債権者, F）に $\min(F, |L|)$ を支払ってバランスを再計算
+3. リストの最後（最大債務者, 出費=L）がリストの最初（最大債権者, F）に $\min(F, |L|)$ を支払ってバランス(負債)を再計算
 4. 全員のバランスが 0 になるまで 2-3 を繰り返す

 # 実験
@@ -145,4 +145,8 @@ B virtually paid ¥81 in total
 C virtually paid ¥76 in total
 ```

-プログラムに落とし込むことができたら、あとはスプレッドシートのマクロにするなり自由だ。面倒なことは全部コンピューターにやらせよう！
+旅行中、A と B は 1 回、C は 3 回支払いを建て替えた。そのうち 3 回は普通の割り勘だが、他 2 回はそれぞれ「C が A と B の分を建て替えた」「C が A の分を建て替えた(=C が A に金を貸した)」である。
+
+このようなケースで一件ずつナイーブに精算しようとすると、合計 12 回のお金のやり取りが発生することになる。しかし負債を同額の債権で打ち消す操作を繰り返して最適化した結果、たった 2 回お金のやり取りをするだけで全員分の精算を完了できることがわかった。
+
+プログラムに落とし込むことができたら、あとはスプレッドシートのマクロにするなりスマホのアプリにするなり自由だ。面倒なことは全部コンピューターにやらせよう！