醉裡挑燈看Code: NTFS

顯示具有 NTFS 標籤的文章。顯示所有文章

2025年4月22日星期二

又回來 Golang 的世界了

花了兩天時間，將之前使用 Tcl + bash script + grep + sed 寫的工具，改用 Golang 重寫。

這一年來已經習慣了 script 的 free Coding 過程，突然回到編譯型語言，還真是有些彆扭XD

距離上一個使用 Golang 開發的專案，也有三年了吧？

不過，編譯型語言還是有好處的，比如一個原本要 54 秒的 netlist 相關報表，現在只要 15 秒即可搞定（之前慢是慢在 bash script，不是 Tcl）。

Golang 還有一個額外的好處就是很容易跨平台編譯，比如我在 Windows 就可以編譯出 Linux 上跑的 ELF 執行檔。

今天在電腦的 WSL2 環境，想要驗證一下功能是否正常，突然發現，原本上面那個例子，居然要跑到 200 秒？

我連 pprof 都用上了，但還是只能看到問題是出在 System Call 上，中間一度還懷疑到我有一個很大的 struct 變數，在遞迴函數裡並不是傳遞 pointer 的緣故！

也懷疑是否是用了大量的 fmt.Sprintf 導致！

最後看到一篇文章，才發現是檔案系統的問題，但我不太確定是 NTFS 的鍋還是 WSL2 + NTFS 的鍋就是了？

總之，要在 Linux 跑程式，還是要使用 Linux 相關的檔案系統比較不會有問題，尤其我的工具就是要寫一大堆跟 cell 有關的報表。

2022年5月26日星期四

消失的記憶 - NTFS

一直到剛剛終於有時間拿出《File System Forensic Analysis》一書來複習一下消失的記憶！

NTFS 雖然是微軟最常使用的檔案系統之一，但因為並沒有公開的規格書，故網路上的資料都有一些不齊全之處。

記得我當初回答同事問題時，也是在網路上找了許久，最後才找到這本書有比較詳盡的描述。

但我印象當初看時，好像也是有一半的地方是看不懂的？

趁著這次重看的機會，看是否能把它搞懂？

---------------------------------------------------------------------

Q1：

為何有些 MFT Entry(FILE RECORD) 資料會有 $BITMAP(0xB0) ，有些就又沒有，他出現的時機是什麼？因為實際的資料不是在 $DATA(0x80) 裡面就可以取到了？

Ans：

Q2：

Metadata 中的 $MFT 大約會拿硬碟的 12.5, 25, 37.5, 50% 的空間來放 $MFT，有的 $MFT 會拿連繼的空間，有的取 $MFT 就要自己去 $DATA 的 DATA RUN 裡面取。

MFT Zone Reservation IS NOT STORED ON DISK

MFT Zone (reserved space for MFT)

1 = 12.5%

2 = 25.0%

3 = 37.5%

4 = 50.0%

Where is this stored on disk?

Ans：

我的猜測如下，對應到 MFT Zone 的位置的 $BITMAP 會被設為 1，表示被使用中，故只有微軟作業系統知道放在哪裡，我想那個表示被使用中的 Entry 如果跳到該 Entry 看，應該是空的。

如果今天 MFT Entry 都用完了，是不是表示就是硬碟滿了？之前有實驗建一個 10MB 的硬碟，用小檔把 MFT Entry 全用滿，後來就沒法再建檔案／目錄了，不過如果 1K空間(MFT Entry)還夠，還是可以建 ADS。

Ans：我覺得是，但我想嘗試去填滿它(Active Disk Editor)，寫入失敗。順帶一提，我寫的是 $BITMAP 這個 meta data 指到的 Data Run 位置，但我不確定這樣對嗎？還是應該也改 $MFT meta data 的 $BITMAP attribute(我這兩邊讀到的位置不一樣，不確定是不是我算錯了，差了 1K)？

補充：我搞混了，$MFT 的 $BITMAP attribute 是針對 MFT Entry 的，$BITMAP 的 Data Run 是針對 Cluster 的，那天不知為何一時想混了。

另一個硬碟滿的條件，那就是只用一個檔案，讓他內容超大，ex: 10MB HD 為例，檔案大於 7~8MB ，那 $BITMAP 裡面的Cluster空間就都會被用掉。

上述是否正確，或是還有什麼情況下，NTFS 會爆掉？

Ans：我也只能想到這兩種，但我直覺會先讓 MFT Entry 爆掉，之後一定無法新增檔案了。

Q 3：

目錄都會有 $INDEX_ROOT(0x90)，當目錄下檔案多時，就會有 $INDEX_ALLOCATION (0xA0)，0x90是 Resident 而 0xA0 是 Non-Resident 的，如果是 0xA0 INDX 申請出來的空間，不會在 $MFT 裡面，所以今天如果只取到 $MFT 想處理目錄，就沒法靠 0x90/0xA0 做 B+Tree 的列舉，就只能自己靠 $MFT 裡面的 0x30 裡面的 Parent directory file record number 去做處理了對嗎？

Ans：

我只知道從 $. parse 的方法，我不知道可以從 $MFT 找到根目錄的方法，而且 0x30 我以為是指 $FILE_NAME attribute？我不確定這邊的 Parent directory file record number 是指什麼？

2016年10月24日星期一

NTFS B+Tree parsing 概念圖

假設我們想要從 B+Tree 找到 g:\123\456.txt 這個檔案，其概念圖大概如下圖

2016年10月21日星期五

NTFS B+Tree parsing

假設我們想要從 B+Tree 找到 g:\123\456.txt 這個檔案，其步驟如下

1. 從 Partition G 的第 1 個 sector 找到 NTFS BootSector，並找到 MFT 的起點

2. 找到 index 為 0x05 的 Entry dot file

從 $INDEX_ROOT Attribute (0x90)可以看出
這個 Attritube 是 resident named，其名字為 $I30
裡面儲存的 INDEX Record 其 type 為 $FILE_NAME (0x30)，大小為 1K
因為大小不夠存放，故需從 $INDEX_ALLOCATION (0xA0) 找到 DataRun
也就是 LCN 為 0x2C 的位置

3. 找到 INDEX Record 的位置，並 parse 一筆一筆的 Index Entry

找到 123 這個名字的 Entry，並得知其在 MFT 的 index 為 0x2C ，此筆的大小為 0x58，
此目錄的建立時間為 2016-10-19 15:25:51 +0800 CST (1D229DA060F6283)

4. 回到 MFT index 為 0x2C 的 Entry，由於該目錄只有 1 個檔案也就是 456.txt

故沒有 $INDEX_ALLOCATION Attribute，所有的資訊都存放在 $INDEX_ROOT，
該檔案在 MFT 的 index 為 0x2D，其最後修改時間為 2016-10-21 14:48:40 +0800 CST

5. 回到 MFT index 為 0x2D 的 Entry，由於檔案過大，
故其 $DATA是另外存放在 MFT 以外的區域，其 LCN 為 0x7F 的位置

6. 最後來到 LCN 為 0x7F 的位置，其內容就是 456.txt 的內容，至此所有的 parsing 告一段落

2016年10月17日星期一

Read disk sector on Windows

關鍵在於磁碟機代號的名字，假設是 D 槽，其名字為 "\\.\D:"
至於是用 Win32 API or C code 實驗結果都一樣
故只要稍微修改一下，便可以用來讀指定的 NTFS Entry

當然我們也可以讀取實體的硬碟開頭，只要遵循以下規則

Name	Meaning
\\.\PhysicalDrive0	電腦的第 1 顆硬碟
\\.\PhysicalDrive1	電腦的第 2 顆硬碟
\\.\c:	電腦的 C 槽
\\.\c:\	電腦的 C 槽 file system


#include <stdio.h>
#include <mem.h>
//-----------------------------------------------------------------------------
#define SECTOR_SIZE         512
//---------------------------------------------------------------------------
// num is sector number, it starts with 0
bool ReadSect(const char *dsk, char *buf, int num)
{
    if (strlen(dsk) == 0) {
        return false;
    }

    if (num < 0) {
        return false;
    }

    FILE *f = fopen(dsk, "rb");
    if (!f) {
        return false;
    }

    fseek(f, num * SECTOR_SIZE, SEEK_SET);

    fread(buf, SECTOR_SIZE, 1, f);

    fclose(f);

    return true;
}
//---------------------------------------------------------------------------
int main(void)
{
    char drv[64];
    memset(drv, 0x00, sizeof(drv));

    char disk;
    printf("Which disk do you want to read ?   ");
    scanf("%c", &disk);

    unsigned int sector;
    printf("Which sector do you want to read ? ");
    scanf("%d", &sector);
    printf("\r\n");

    // use "\\.\PhysicalDrive" to read
    sprintf(drv, "\\\\.\\%c:", disk);

    char buf[SECTOR_SIZE];
    ReadSect(drv, buf, sector);

    int line = 0;
    for (int i = 0; i < SECTOR_SIZE; i++) {
        if (line == 0) {
            printf("0x%04X  ", sector * SECTOR_SIZE + (i/16) * 16);
        }

        printf("%02X ", (unsigned char)buf[i]);

        line++;

        if (line == 16) {
            printf("\n");
            line = 0;
        }
    }

    printf("\n");
    return 0;
}

NTFS Entry 概念圖

2016年10月11日星期二

NTFS $Secure parsing

藍色代表 MFT Entry Header
綠色代表 Attribute Header
粉紅色則是 Attribute Name or Attribute 內容

底下是人工 parsing 的結果

[Entry Header]

Length 0x02F8
The Entry End is 0xFFFFFF (為了 8 的倍數，後面補了 0x00000000，故長度是 0x02F8)

The next attribute id is 0x000F

0x10 -- 0x0000
0x30 -- 0x0007
0x80 -- 0x0008
0x90 -- 0x000B
0x90 -- 0x000E
0xA0 -- 0x0009
0xA0 -- 0x000C
0xB0 -- 0x000A
0xB0 -- 0x000D

[0x10] -- $STANDARD_INFORMATION

The file creation time is "1601-01-01, 00:00 UTC" + (0x01D21B378809B277 / pow(10,7)) (seconds)

[0x30] -- $FILE_NAME

The name of this entry is $Secure, it has 7 characters, it's lenth is 2 x 7 = 14.

[0x80] -- $DATA, non-resident, named

starting VCN 0x00
last VCN 0x40

attribute name is $SDS, it has 4 characters (Name length on offset 0x09, one byte).

offset to the Data Runs 0x48

Data Runs

11 41 2D 00 00 00 00 00

11 41 2D - 00 00 00 00 00 (group)

first one is header, it means one byte length, one byte offset.

length 0x41
offset 0x2D

Move to next group

11 41 2D 00 00 00 00 00 -> 11 41 2D - 00 00 00 00 00

Because header is 0x00, it only has one data run.

[0x90] -- $INDEX_ROOT, resident, named

attribute name is $SDH, it has 4 characters (Name length on offset 0x09, one byte).

[0x90] -- $INDEX_ROOT, resident, named

attribute name is $SII, it has 4 characters (Name length on offset 0x09, one byte).

[0xA0] -- $INDEX_ALLOCATION, non-resident, named

attribute name is $SDH, it has 4 characters (Name length on offset 0x09, one byte).

[0xA0] -- $INDEX_ALLOCATION, non-resident, named

attribute name is $SII, it has 4 characters (Name length on offset 0x09, one byte).

[0xB0] -- $BITMAP, resident, named

attribute name is $SDH, it has 4 characters (Name length on offset 0x09, one byte).

[0xB0] -- $BITMAP, resident, named

attribute name is $SII, it has 4 characters (Name length on offset 0x09, one byte).

2016年10月7日星期五

NTFS Entry timestamp

Time stamps are stored in 64-bit integer values:
Number of 0.1μs since 1601-01-01, 00:00 UTC.

底下是使用 Go 語言撰寫的轉換 timestamp 程式碼


package main

import (
    "fmt"
    "strconv"
    "time"
)

func main() {
    fmt.Printf("timestamp (little endian case): ")
    var input string
    fmt.Scanln(&input)

    count := len(input)
    if count > 16 {
        fmt.Println("error input")
        return
    }
    
    if count % 2 != 0 {
        input = "0" + input
        count += 1
    }
    
    var s string
    for i := count - 1; i >= 0; i -= 2 {
        s += string(input[i-1]) + string(input[i])
    }
    
    n, err := strconv.ParseUint(s, 16, 64)
    if err != nil {
        fmt.Println("error input")
        return
    }

    fmt.Println("")

    base := time.Date(1601, time.January, 1, 0, 0, 0, 0, time.UTC)
    sec := base.Unix() + int64(n / 10000000)

    fmt.Println(time.Unix(sec, 0))
}

2016年10月6日星期四

NTFS 基本概念

Volume

可以是 1 個 partition，1 個硬碟，甚至是多個硬碟
簡單起見，就當成是 1 個分割區

Sector

實際硬碟存取的最小單位，通常是 512 bytes

Cluster

叢集，OS 存取硬碟的最小邏輯單位
分割區格式化時會決定此值的大小，預設值是 4096 bytes

NTFS Boot Sector

位於分割區最前面的第 1 個 cluster
裡面會有 MFT 的起始位置，1 個 cluster 等於幾個 sector，1個 sector 有幾個 bytes

MFT (Master File Table)

NTFS File System 的核心概念
每一個檔案或是目錄都是 1 或多筆的 Entry，儲存在此區域
如果資料超過 1 筆 Entry 的大小，則可能會存放在 MFT 以外
可以把 MFT 想成是關聯式資料庫，Entry 是 Row，而 Attributes 則是欄位

MFT Entry

MFF 裡面的每筆 Record

前面 16 筆是系統保留的 Entry，名字前會帶有 '$' 且第一個字母為大寫
其作用是描述 MFT 及 NTFS 本身，也可以稱為 File System Metadata Files

$MFT
$MFTMirr
$LogFile 等

MFT Entry Attribute

描述每 1 筆 Entry，1 筆 Entry 可能會有多筆 Attributes
例如 $STANDARD_INFORMATION，$FILE_NAME 等

MFT Entry resident Attrubute

此筆 Attribute 的資料存放在 Entry 中，可由 Attribute 裡的 flag 看出是否是 resident

MFT Entry non-resident Attrubute

此筆 Attribute 的資料存放在 Entry 以外，可由 Attribute 裡的 flag 看出是否是 resident

LCN (Logical Cluster Number)

實際的 Cluster 位置(號碼)，功用是類似 index

VCN (Virtual Cluster Number)

虛擬的 Cluster 位置(號碼)，功用是類似 index

2016年10月5日星期三

NTFS 參考資料

由於機車的微軟沒有公開 Spec
故 NTFS 的 parsing 都是由高手努力推斷出來的結果
底下是建議的閱讀順序

先從微軟的文章了解基本資訊
https://technet.microsoft.com/en-us/library/cc781134(v=ws.10).aspx

也可以看看 wiki
https://zh.wikipedia.org/wiki/NTFS

再來可以看 Brian Carrier 所撰寫的《File System Forensic Analysis》一書
很多網路上看到的簡報都會參考它

最後則是 Linux 社群的研究結果，也是目前我覺得最正確的 Spec
https://sourceforge.net/projects/linux-ntfs/files/NTFS%20Documentation/0.6/

NTFS $MFT parsing

$MFT 是 MFT 裡的第一筆 Entry，作用是描述 MFT 本身，總共有 4 個 attributes。

藍色代表 MFT Entry Header
綠色代表 Attribute Header
粉紅色則是 Attribute 內容

底下是人工 parsing 的結果

[Entry Header]

Length 0x0198
The Entry End is 0xFFFFFF (為了 8 的倍數，後面補了 0x00000000，故長度是 0x0198)

The next attribute id is 0x0007

0x10 -- 0x0000
0x30 -- 0x0003
0x80 -- 0x0006
0xB0 -- 0x0005

[0x10] -- $STANDARD_INFORMATION

The file creation time is "1601-01-01, 00:00 UTC" + (0x01D21B378809B277 / pow(10,7)) (seconds)

[0x30] -- $FILE_NAME

The name of this entry is $MFT, it has 4 characters, it's lenth is 2 x 4 = 8.

[0x80] -- $DATA, non-resident, no name

Note: not every 0x80 attribute is non-resident, we need to check non-resident flag.

starting VCN 0x00
last VCN 0x3F

offset to the Data Runs 0x40

Data Runs

21 40 BD 04 00 00 00 00

21 40 BD 04 - 00 00 00 00 (group)

first one is header, it means one byte length, two byte offset.

length 0x40
offset 0x04BD

Because $MFT is the description of MFT. this 0x80 attribute tells us that
MFT is at 0x4BD000 (0x04BD x 4096) and the length is 262144 bytes (0x40 * 4096).
We can use starting VCN and last VCN to check this length (0x00 - 0x3F, length 0x40) too.

Move to next group

21 40 BD 04 00 00 00 -> 21 40 BD 04 - 00 00 00 00

Because header is 0x00, it only has one data run.

[0xB0] -- $BITMAP, non-resident, no name

starting VCN 0x00
last VCN 0x01

offset to the Data Runs 0x40

Data Runs

21 01 BC 04 11 01 FF 00

21 01 BC 04 - 11 01 FF - 00 (group)

Run1 21 01 BC 04

length 0x01
offset 0x04BC

Run2 11 01 FF

length 0x01
offset 0x05BB (0x04BC + 0xFF)

從 Run1 得知，data 位於 0x4BC000 (0x04BC x 4096)
每 1 個 bit 代表 1 個 Entry，1 代表使用中，0 則是未使用
Dump 出來的資料如下

01 9F FF 00 FF FF

前 16 筆 Entry 是 metafile
第 25 筆 Entry 開始是 File System 的檔案或目錄
連續的17筆資料中，有 2 筆是未使用的
如果跳到該筆 Entry，可以從 Entry Header Flag 確認，其值應該為 0x00

至於 Run2，dump 出來的值都是 0x00
故只有在 MFT 前 41筆 Entry 是有資料的 (包含保留及空的未使用)

訂閱：文章 (Atom)

pretty code

2025年4月22日 星期二

2022年5月26日 星期四

2016年10月24日 星期一

2016年10月21日 星期五

2016年10月17日 星期一

2016年10月11日 星期二

2016年10月7日 星期五

2016年10月6日 星期四

2016年10月5日 星期三

2025年4月22日星期二

2022年5月26日星期四

2016年10月24日星期一

2016年10月21日星期五

2016年10月17日星期一

2016年10月11日星期二

2016年10月7日星期五

2016年10月6日星期四

2016年10月5日星期三