Redis源码阅读(六) 压缩列表

压缩列表是Redis为了节省内存而设计的特殊编码的双端链表，可以存储字符串和整数，整数值保存为实际的整数而不是字符数组，它允许在链表两端进行push和pop操作，时间复杂度为O(1)，很多的操作需要对内存进行重新分配，所以实际情况还与内存使用的大小有关。

1. 定义

1.1 压缩列表

从本质上来说，ziplist就是一串按照某种固定方式存储的字节数组，不依赖c语言的strcut，完全是内存的一些约束和操作，它包括5个组成部分zlbytes, zltail, zllen, entries, zlend。

属性	类型	长度	作用
zlbytes	uint32_t	4字节	整个压缩列表使用的总字节数
zltail	uint32_t	4字节	尾结点到压缩列表头部有多少字节
zllen	uint16_t	2字节	结点的数量，超过UINT16_MAX则需要变量列表计算数量
entries	entry	长度不定	压缩列表的结点，多个结点间分割边界，通过结点属性值区分
zlend	uint8_t	1字节	结束标识符0XFF（255），表示压缩列表的结尾

1.2 压缩列表结点

结点可以存储字符串或在整数。

字符串
- 长度(1 ~ 2^6-1) 的字节数组
- 长度(1 ~ 2^14-1) 的字节数组
- 长度(1 ~ 2^32-1) 的字节数组
整数值
- 长度4bit，介于0 ~ 12的无符号整数
- 长度1字节，有符号整数
- 长度3字节，有符号整数
- int16_t类型整数
- int32_t类型整数
- int64_t类型整数

每个结点由三部分组成，prevlen，encoding，entry-data

prevlen

该属性表示当前结点的前驱结点的长度。
- 若前驱结点长度小于254字节，使用1byte存储prevlen
- 若前驱结点长度大于等于254字节，使用5byte存储prevlen，其中第一字节被设置为0XFE，表示长度由后4字节表示。
因为每个结点都保存了前一个结点的长度，所以可以通过地址计算出上一个结点的地址。
encoding

该属性表示当前结点保存数据所需要的类型和长度，有时候encoding也表示数据本身（小的整数值）

编码	描述
00pppppp（1字节）	使用6bit存储字符串长度
01pppppp qqqqqqqq （2字节）	使用14bit存储字符串长度
10000000 xxxxx……..（5字节）	使用32bit存储字符串长度
11000000（1字节）	使用2字节表示int16_t
11010000（1字节）	使用4字节表示int32_t
11100000（1字节）	使用8字节表示int64_t
11110000（1字节）	使用3字节表示24位有符号整数
11111110（1字节）	使用1字节表示8位有符号整数
1111xxxx（1字节）	使用4bit表示 0 ~ 12无符号整数
11111111（1字节）	0xFF表示ziplist结束标识

entry-data

结点的数据值可以是一个字节数组或者整数，类型有encoding决定。

2. 连锁更新

压缩列表中使用prevlen保存前驱结点的长度，长度小于254需要一个字节存储prevlen，但若长度大于254则需要5个字节进行存储，那么可能会存在这样的一种情况：

在一个ziplist中，存在多个连续的结点，大小介于250~253字节，如结点 A,B,C,D,E,F,…。这时，要插入一个大于254结点的O到B结点之前，由于B结点之前的prevlen保存的是A结点的长度(小于254)，所以只需要1个字节，但要保存结点O的话需要扩展prevlen为5个字节，这样导致B结点的大小大于了254，而C结点也需要扩展，如此产生连锁更新的效应。

添加结点可能触发压缩列表的连锁更新操作，这个操作的开销还是比较大的，在最坏的情况下需要执行N次重新分配内存操作。

删除结点也可能导致这种情况发生，但为了防止一会扩展，一会缩小的抖动发生，当删除结点时出现这种情况并不进行处理。

实际情况中，这种极端情况比较少见，所以并不会对性能造成很大影响。

3. 源码剖析

连锁更新

/**
 * 当一个结点插入时，我们需要设置下一个结点的prevlen等于插入结点的大小。
 * 这样会导致1个字节无法存储插入结点的长度，所以需要扩展到5个字节存储prevlen。
 * 但是，当多个连续结点的大小接近ZIP_BIG_PREVLEN时，当插入结点长度大于ZIP_BIG_PREVLEN，
 * 就需要扩展后一个结点的prevlen为5字节，但后一个结点的长度本来是253，加4字节后大于了254,
 * 这就导致后一个字节也需要扩展，如此进行下去，连续接近254字节的结点都需要更新
 *
 * 反之来说，长度变小引起的连续缩小也是有可能的，但为了避免反复扩展-缩小，
 * 我们并不处理缩小的情况。
 */
unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize;
    size_t offset, noffset, extra;
    unsigned char *np;
    zlentry cur, next;

    // 遍历ziplist
    while (p[0] != ZIP_END) {

        // 获取一个结点
        zipEntry(p, &cur);

        // 结点长度
        rawlen = cur.headersize + cur.len;

        // 存储结点长度所需字节数
        rawlensize = zipStorePrevEntryLength(NULL,rawlen);

        /* Abort if there is no next entry. */
        // 结尾标识则退出
        if (p[rawlen] == ZIP_END) break;

        // 下一个结点，存储在next中
        zipEntry(p+rawlen, &next);

        /* Abort when "prevlen" has not changed. */
        // 结点不需要改变prevlen，表示当前结点空间足够
        if (next.prevrawlen == rawlen) break;

        if (next.prevrawlensize < rawlensize) {
            /* The "prevlen" field of "next" needs more bytes to hold
             * the raw length of "cur". */

            // 表示next的prevlen存储不了cur的结点长度
            // 所以程序需要对next结点的header部分进行扩展

            // 记录偏移量
            offset = p-zl;

            // 计算需要扩展的字节大小
            extra = rawlensize-next.prevrawlensize;

            // 扩展zl的空间大小
            zl = ziplistResize(zl,curlen+extra);

            // 还原p指针的位置
            p = zl+offset;

            /* Current pointer and offset for next element. */
            // 记录下个结点的偏移量
            np = p+rawlen;
            noffset = np-zl;

            /* Update tail offset when next element is not the tail element. */
            // 更新ziplist的tail结点，如果next是tail则不用更新
            if ((zl+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np) {
                ZIPLIST_TAIL_OFFSET(zl) =
                    intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+extra);
            }

            /* Move the tail to the back. */
            // 移动插入位置之后的数据到尾部，为cur留出空间
            memmove(np+rawlensize,
                np+next.prevrawlensize,
                curlen-noffset-next.prevrawlensize-1);
            zipStorePrevEntryLength(np,rawlen);

            /* Advance the cursor */
            // p指针指向下一个结点，当前ziplist的总字节数
            p += rawlen;
            curlen += extra;
        } else {
            // next结点的prevlen存储大于cur结点的大小，但不需要缩小，以防止抖动的发生
            if (next.prevrawlensize > rawlensize) {
                /* This would result in shrinking, which we want to avoid.
                 * So, set "rawlen" in the available bytes. */
                // 此处表示存储rawlen需要1字节，而next的prevlen有5字节
                // 但为防止抖动发生，不进行缩小操作，只把rawlen存储到next的prevlen中即可
                zipStorePrevEntryLengthLarge(p+rawlen,rawlen);
            } else {
                // cur的rawlen刚好可以存储到next的prevlen中
                zipStorePrevEntryLength(p+rawlen,rawlen);
            }

            /* Stop here, as the raw length of "next" has not changed. */
            break;
        }
    }
    return zl;
}

插入结点

unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;
    unsigned int prevlensize, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; /* initialized to avoid warning. Using a value
                                    that is easy to see if for some reason
                                    we use it uninitialized. */
    zlentry tail;

    /* Find out prevlen for the entry that is inserted. */
    if (p[0] != ZIP_END) { // 插入在表中间
        ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);
    } else { // 插入表尾

        // 获取表尾结点地址
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);

        // 获取一个结点的长度，用于插入结点的prevlen
        if (ptail[0] != ZIP_END) {
            prevlen = zipRawEntryLength(ptail);
        }
    }

    /* See if the entry can be encoded */
    // 转换字符串s为整数值，如可以，则存储在value中，编码方式存储在encoding中
    if (zipTryEncoding(s,slen,&value,&encoding)) {

        /* 'encoding' is set to the appropriate integer encoding */
        // 保存encoding所需的字节数
        reqlen = zipIntSize(encoding);
    } else {
        /* 'encoding' is untouched, however zipStoreEntryEncoding will use the
         * string length to figure out how to encode it. */
        // 不能转换为整数，则长度为slen
        reqlen = slen;
    }

    /* We need space for both the length of the previous entry and
     * the length of the payload. */

    // 计算存储前驱结点长度所需要的字节数
    reqlen += zipStorePrevEntryLength(NULL,prevlen);

    // 计算存储当前结点长度所需要的字节数
    reqlen += zipStoreEntryEncoding(NULL,encoding,slen);

    /* When the insert position is not equal to the tail, we need to
     * make sure that the next entry can hold this entry's length in
     * its prevlen field. */
    int forcelarge = 0;
    // 当插入的位置不是尾部，我们需要检查下个结点的prevlen是否足够
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;
    if (nextdiff == -4 && reqlen < 4) {
        nextdiff = 0;
        forcelarge = 1;
    }

    /* Store offset because a realloc may change the address of zl. */
    // 计算偏移offset，因为realloc可能改变ziplist的地址
    offset = p-zl;
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;

    /* Apply memory move when necessary and update tail offset. */
    // 如果移动了tail偏移量，则需要移动内存
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        // 为新结点留出空间
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        /* Encode this entry's raw length in the next entry. */
        // 下个结点的prevlen扩展并存储
        if (forcelarge)
            zipStorePrevEntryLengthLarge(p+reqlen,reqlen);
        else
            zipStorePrevEntryLength(p+reqlen,reqlen);

        /* Update offset for tail */
        // 更新尾部偏移
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        /* When the tail contains more than one entry, we need to take
         * "nextdiff" in account as well. Otherwise, a change in the
         * size of prevlen doesn't have an effect on the *tail* offset. */
        // 当尾部包括超过1个结点时，我们需要计算nextdiff差值，否则就不够存储
        zipEntry(p+reqlen, &tail);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        /* This element will be the new tail. */
        // 插入在尾部，新结点就是尾结点
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    /* When nextdiff != 0, the raw length of the next entry has changed, so
     * we need to cascade the update throughout the ziplist */
    // 若nextdiff不为0，可能发生连锁更新
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    /* Write the entry */
    // 前驱结点的长度写入到插入结点的prevlen
    p += zipStorePrevEntryLength(p,prevlen);
    // 插入接的的长度写入到插入节点的encoding
    p += zipStoreEntryEncoding(p,encoding,slen);

    // 根据编码方式，将结点值写入value
    if (ZIP_IS_STR(encoding)) {
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);
    }

    // ziplist结点个数+1
    ZIPLIST_INCR_LENGTH(zl,1);
    return zl;
}

删除结点

unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
    unsigned int i, totlen, deleted = 0;
    size_t offset;
    int nextdiff = 0;
    zlentry first, tail;

    // 计算要删除的结点个数和占用内存字节数
    zipEntry(p, &first);
    for (i = 0; p[0] != ZIP_END && i < num; i++) {
        p += zipRawEntryLength(p);
        deleted++;
    }

    // 需要删除的字节数
    totlen = p-first.p; /* Bytes taken by the element(s) to delete. */
    if (totlen > 0) {
        if (p[0] != ZIP_END) { // p不是结束符

            /* Storing `prevrawlen` in this entry may increase or decrease the
             * number of bytes required compare to the current `prevrawlen`.
             * There always is room to store this, because it was previously
             * stored by an entry that is now being deleted. */
            // 由于删除结点后，下个结点的prevlen可能无法表示前驱结点的长度
            // 所以需要提前计算出其差值
            nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);

            /* Note that there is always space when p jumps backward: if
             * the new previous entry is large, one of the deleted elements
             * had a 5 bytes prevlen header, so there is for sure at least
             * 5 bytes free and we need just 4. */
            // p指针后移nextdiff差值，减少删除的字节数来存放多余的差值
            p -= nextdiff;

            // 将first的prevlen扩展到p
            zipStorePrevEntryLength(p,first.prevrawlen);

            /* Update offset for tail */
            // 更新表头偏移量tail_offset成员
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);

            /* When the tail contains more than one entry, we need to take
             * "nextdiff" in account as well. Otherwise, a change in the
             * size of prevlen doesn't have an effect on the *tail* offset. */
            // 记录当前p执行的结点，存储到tail中
            zipEntry(p, &tail);

            // 若删除的不是尾结点，表尾的偏移量需要加上nextdiff这个差值
            if (p[tail.headersize+tail.len] != ZIP_END) {
                ZIPLIST_TAIL_OFFSET(zl) =
                   intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
            }

            /* Move tail to the front of the ziplist */
            // 移动内存
            memmove(first.p,p,
                intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);
        } else {
            /* The entire tail was deleted. No need to move memory. */
            // 由于是表尾结点，所以直接更新tail_offset
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe((first.p-zl)-first.prevrawlen);
        }

        /* Resize and update length */
        // 删除多余的内存空间
        offset = first.p-zl;
        zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
        ZIPLIST_INCR_LENGTH(zl,-deleted);
        p = zl+offset;

        /* When nextdiff != 0, the raw length of the next entry has changed, so
         * we need to cascade the update throughout the ziplist */
        if (nextdiff != 0) // 考虑连锁更新
            zl = __ziplistCascadeUpdate(zl,p);
    }
    return zl;
}