详述在无备份情况下postgreSQL中为什么drop truncate table基本是不能恢复的

最近在研究postgreSQL的特殊恢复手段；pg的表数据直接独占存放在单个或多个数据文件，这让pg本身的恢复格局较为简单。

postgreSQL的基本情况：

每个表和索引都是单独的文件，当表或索引太大时会扩展到多个文件
每套库都有自己的数据字典表 pg_class等，pg_class的文件号是1529
pg_global表空间里记录了核心字典信息就是有哪些数据库和数据库的oid
postgreSQL的块头是 pageHeader ，pageheader 24个字节之后是 ItemIdData 即行指针，之后是free space，之后是数据tuple heap
pageheader里没有该page的位置信息，但这个文档 https://www.jianshu.com/p/375e2b9fd079 里说有pd_type和 pd_oid 信息，即24个字节里的最后4个字节，但源代码里看没有这些结构，源代码里最后4字节是 TransactionId(uint32) pd_prune_xid; https://doxygen.postgresql.org/bufpage_8h_source.html 。这些结构可能是某些特殊发行版本里搞出来的。
另外在tuple的ctid里有块号和行号，但是没有文件号，即如(0,0),(0,1),(0,3)这样的序列信息没有意义

基于以上事实做的一些实验：

truncate table xx 表对应的文件马上被vacuum掉，即文件大小归零，文件没有被删除
delete from xx 删除全表，表对应的文件马上被vacuum掉，即文件大小归零，文件没有被删除
drop table xx 表对应的文件马上被vacuum掉，即文件大小归零，文件没有被删除

以上三种情况会因为pg的page缺少必要区分page的page内特征信息（例如oracle的rdba)，虽然可以通过扫描磁盘获得这些块，但很难搞清楚这些块属于哪个文件（哪个表或索引）

另做了一个实验，在pg 12下创建2张表结构一致，数据不一致。通过对这2张表对应对文件相互替换冒充，通过pg实例可以访问这2张表（被互相替换后），这说明pg是不验证也无法验证文件内容与库中的表的强一致性的。 page中缺乏一个重要的oid信息，这将导致pg的page不具有碎片扫描合并的可能性。

但因为pg在做drop table , truncate table 时会附带收缩数据文件，又因为其page数据结构中没有合适的特征信息，所以造成其在无备份情况下基本不可能恢复数据。

对于truncate，官方的说明是 TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, but since it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation. This is most useful on large tables.

https://www.postgresql.org/docs/9.1/sql-truncate.html

drop table /truncate table都会引发数据文件收缩，即文档所说 it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation ， 都不需要你去vacuum表。

虽然对于这些被回收的空间，drop /truncate操作都没有去刻意填零，这部分空间会被文件系统回收。

对于Oracle数据库的数据文件而言，即便其被从文件系统或ASM上删除了，但因为其数据块仍存在于磁盘上，我们还是可以通过PRMSCAN工具来将这些数据块扫描后合并为数据文件，这是因为oracle的数据块自带身份信息 rdba_kcbh , rdba代表了该数据块的文件号和块号，从而可以基于rdba来重组数据文件。

但对于postgreSQL的 page而言，其没有有效的类似rdba的信息，唯一类似的是每一行有一个ctid信息，该信息类似oracle的rowid，但是ctid里只有块号和行号，缺少文件号，且用户表不是默认都有OIDS:

OIDs are not added to user-created tables, unless WITH OIDS is specified when the table is created, or the default_with_oids configuration variable is enabled.

所以由于上述信息的缺少，导致虽然你可以扫描磁盘上的pg的page页，但很难将它们有效合并。

Overall Page Layout

Item	Description
PageHeaderData	20 bytes long. Contains general information about the page, including free space pointers.
ItemIdData	Array of (offset,length) pairs pointing to the actual items. 4 bytes per item.
Free space	The unallocated space. New item pointers are allocated from the start of this area, new items from the end.
Items	The actual items themselves.
Special space	Index access method specific data. Different methods store different data. Empty in ordinary tables.

PageHeaderData Layout

Field	Type	Length	Description
pd_lsn	XLogRecPtr	8 bytes	LSN: next byte after last byte of xlog record for last change to this page
pd_tli	TimeLineID	4 bytes	TLI of last change
pd_lower	LocationIndex	2 bytes	Offset to start of free space
pd_upper	LocationIndex	2 bytes	Offset to end of free space
pd_special	LocationIndex	2 bytes	Offset to start of special space
pd_pagesize_version	uint16	2 bytes	Page size and layout version number information

HeapTupleHeaderData Layout

Field	Type	Length	Description
t_xmin	TransactionId	4 bytes	insert XID stamp
t_cmin	CommandId	4 bytes	insert CID stamp
t_xmax	TransactionId	4 bytes	delete XID stamp
t_cmax	CommandId	4 bytes	delete CID stamp (overlays with t_xvac)
t_xvac	TransactionId	4 bytes	XID for VACUUM operation moving a row version
t_ctid	ItemPointerData	6 bytes	current TID of this or newer row version
t_natts	int16	2 bytes	number of attributes
t_infomask	uint16	2 bytes	various flag bits
t_hoff	uint8	1 byte	offset to user data

基于上述情况，目前得出的结论是对于postgreSQL中的drop 和 truncate由于postgreSQL先天不在page中存放文件号或表号或对象号这些信息，其基本上是没法做到碎片合并的，因此也很难基于软件工程去恢复。

详述在无备份情况下postgreSQL中为什么drop truncate table基本是不能恢复的

Comment 取消回复