iOS类结构分析之cache-t

作者: xxxxxxxx_123 | 来源:发表于2020-01-03 18:08 被阅读0次

iOS类结构分析之cache-t
iOS底层之类的结构分析
iOS类结构：cache_t分析
iOS-底层分析之类的结构分析
iOS 类的结构分析（下）
[iOS] 类 & 类结构分析
iOS底层之cache_t探究
iOS 类结构分析
iOS类结构分析
iOS - 类结构分析

cache-t存在的原因

cache-t缓存的是类的实例方法，通过哈希表的方法缓存方法的IMP，借此提高方法的查找速度。

 Cache readers (PC-checked by collecting_in_critical())
 * objc_msgSend*
 * cache_getImp
 * 
 * Cache writers (hold cacheUpdateLock while reading or writing; not PC-checked)
 * cache_fill         (acquires lock)
 * cache_expand       (only called from cache_fill)
 * cache_create       (only called from cache_expand)
 * bcopy               (only called from instrumented cache_expand)
 * flush_caches        (acquires lock)
 * cache_flush        (only called from cache_fill and flush_caches)
 * cache_collect_free (only called from cache_expand and cache_flush)

缓存的操作流程如下：

缓存读取：
objc_msgSend
cache_getImp
缓存写入：
cache_fill
cache_expand
cache_create
bcopy
flush_caches
cache_flush
cache_collect_free

当我们调用方法的时候，会先去cache中查找是否有缓存的方法，如果没有缓存，在去类对象方法列表中查找，以此类推直到找到方法之后，就会将方法直接存储在cache中，下一次在调用这个方法的时候，就会在类对象的cache里面找到这个方法，直接调用。那么cache_t是怎样进行缓存的呢？

cache_t代码结构

typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits

struct cache_t {
    struct bucket_t *_buckets; // 8
    mask_t _mask;  // 4
    mask_t _occupied; // 4

public:
    struct bucket_t *buckets();
    mask_t mask();
    mask_t occupied();
    void incrementOccupied();
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    void initializeToEmpty();

    mask_t capacity();
    bool isConstantEmptyCache();
    bool canBeFreed();

    static size_t bytesForCapacity(uint32_t cap);
    static struct bucket_t * endMarker(struct bucket_t *b, uint32_t cap);

    void expand();
    void reallocate(mask_t oldCapacity, mask_t newCapacity);
    struct bucket_t * find(cache_key_t key, id receiver);

    static void bad_cache(id receiver, SEL sel, Class isa) __attribute__((noreturn));
};

typedef uintptr_t cache_key_t;

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    MethodCacheIMP _imp;
    cache_key_t _key;
#else
    cache_key_t _key;
    MethodCacheIMP _imp;
#endif

public:
    inline cache_key_t key() const { return _key; }
    inline IMP imp() const { return (IMP)_imp; }
    inline void setKey(cache_key_t newKey) { _key = newKey; }
    inline void setImp(IMP newImp) { _imp = newImp; }

    void set(cache_key_t newKey, IMP newImp);
};

通过分析cache_t的结构，我们可以看出cache_t包含了一个bucket_t的结构体指针，另外还有2个uint32_t的_mask和_occupied的元素。

struct bucket_t *_buckets 一个装着缓存对象的数组（该对象的key是SEL，也就是方法的名称，该对象的value是方法的IMP）
mask_t _mask 数组的长度 - 1
mask_t _occupied 已经缓存的方法的数量

写入缓存的流程

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
    cacheUpdateLock.assertLocked();

    // Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return;

    cache_t *cache = getCache(cls);
    cache_key_t key = getKey(sel);

    // Use the cache as-is if it is less than 3/4 full
    mask_t newOccupied = cache->occupied() + 1;
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
        // Cache is read-only. Replace it.
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
    else {
        // Cache is too full. Expand it.
        cache->expand();
    }

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    bucket->set(key, imp);
}

由以上代码，我们可以得出以下流程图：

image

写入缓存的时候，我们会先判断传入的class是否初始化，如果没有初始化直接return；

bool isInitialized() {
    return getMeta()->data()->flags & RW_INITIALIZED;
}
Class getMeta() {
    if (isMetaClass()) return (Class)this;
    else return this->ISA();
}
bool isMetaClass() {
    assert(this);
    assert(isRealized());
    return data()->ro->flags & RO_META;
}
objc_object::ISA() {
    assert(!isTaggedPointer()); 
#if SUPPORT_INDEXED_ISA
    if (isa.nonpointer) {
        uintptr_t slot = isa.indexcls;
        return classForIndex((unsigned)slot);
    }
    return (Class)isa.bits;
#else
    return (Class)(isa.bits & ISA_MASK);
#endif
}

接着通过cache_getImp(cls, sel)获取IMP，如果拿到直接返回；
接着通过类的getCache方法获取到cache

cache_t *getCache(Class cls) 
{
    assert(cls);
    return &cls->cache;
}

接着通过getKey将sel转化成cache_key_t；

cache_key_t key = getKey(sel);
cache_key_t getKey(SEL sel) 
{
    assert(sel);
    return (cache_key_t)sel;
}

接着将占用空间+1，接着获取缓存的空间，如果缓存空间是空的，则创建空间，第一次创建的空间为4

INIT_CACHE_SIZE 4

如果所占用的空间低于缓存空间的的3/4，则继续使用现在的空间；
如果所占用的空间高于缓存空间的的3/4，则需要扩展空间；

cache->expand();

// 扩展空间的规则：先获取旧的缓存存储空间，当获取到的旧空间存在的时候，
// 就扩展到原来的2倍，否则就使用4，然后调用reallocate，创建新的存储空间
void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    uint32_t oldCapacity = capacity();
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can't grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
    }

    reallocate(oldCapacity, newCapacity);
}
// 重新开辟空间
// 获取就的哈希表，生成新的哈希表，给哈希表和mask赋值
// 释放旧的哈希表所占用的空间，抛弃旧值
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    bool freeOld = canBeFreed();
    bucket_t *oldBuckets = buckets();
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}
void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    mega_barrier();
    _buckets = newBuckets;
    mega_barrier();
    _mask = newMask;
    _occupied = 0;
}

接着通过key在哈希表中查找存储空间，如果key==0则说明之前未存储过这个key，占用空间+1；
存储key，imp ；

void bucket_t::set(cache_key_t newKey, IMP newImp)
{
    assert(_key == 0  ||  _key == newKey);
    _imp = newImp;
    
    if (_key != newKey) {
        mega_barrier();
        _key = newKey;
    }
}

bucket_t * cache_t::find(cache_key_t k, id receiver)
{
    assert(k != 0);

    bucket_t *b = buckets();
    mask_t m = mask();
    // 通过cache_hash函数【begin  = k & m】计算出key值 k 对应的 index值 begin，用来记录查询起始索引
    mask_t begin = cache_hash(k, m);
    // begin 赋值给 i，用于切换索引
    mask_t i = begin;
    do {
        if (b[i].key() == 0  ||  b[i].key() == k) {
            //用这个i从散列表取值，如果取出来的bucket_t的 key = k，则查询成功，返回该bucket_t，
            //如果key = 0，说明在索引i的位置上还没有缓存过方法，同样需要返回该bucket_t，用于中止缓存查询。
            return &b[i];
        }
    } while ((i = cache_next(i, m)) != begin);
    
    // 这一步其实相当于 i = i-1,回到上面do循环里面，相当于查找散列表上一个单元格里面的元素，再次进行key值 k的比较，
    //当i=0时，也就i指向散列表最首个元素索引的时候重新将mask赋值给i，使其指向散列表最后一个元素，重新开始反向遍历散列表，
    //其实就相当于绕圈，把散列表头尾连起来，不就是一个圈嘛，从begin值开始，递减索引值，当走过一圈之后，必然会重新回到begin值，
    //如果此时还没有找到key对应的bucket_t，或者是空的bucket_t，则循环结束，说明查找失败，调用bad_cache方法。
 
    // hack
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)k, cls);
}

总结：

当第一次使用方法时，通过isa找到方法之后，会把方法按照MethodCacheIMP _imp;cache_key_t_key的方式缓存在cache的_buckets中。缓存的时候，如果是第一次，系统会开辟4个空间长度的哈希表，并将_mask的值置为哈希表的长度减1，之后通过SEL & mask计算出方法存储的下标值，并将方法存储在哈希表中。

当哈希表中存储的方法占据哈希表长度超过3/4的时候，哈希列表会进行扩展空间，将创建一个空间为之前2倍的新哈希表，重置_mask的值，抛弃旧哈希表的值，最后释放旧的哈希表的内存。

如果一个类中方法很多，其中很可能会出现多个方法的SEL & mask得到的值为同一个下标值，那么会调用cache_next函数往下标值-1位去进行存储，如果下标值-1位空间中有存储方法，并且key不与要存储的key相同，那么再到前面一位进行比较，直到找到一位空间没有存储方法或者key与要存储的key相同为止，如果到下标0的话就会到下标为_mask的空间也就是最大空间处进行比较。

当要查找方法时，并不需要遍历哈希表，同样通过SEL & mask计算出下标值，直接去下标值的空间取值即可，同上，如果下标值中存储的key与要查找的key不相同，就去前面一位查找。

当缓存的容量到了一个合理值以后，稳定下来，方法的调用就会特别的快。基本上就是以合适空间换取了时间。

tips:

调用实例对象方法的时候，系统写入缓存的时候为什么传入的是Class?

cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)，因为对象方法是存在类的内存中的，是由类管理的。

tips:

mask的作用

用来生成哈希表的寻址下标

tips: 方法相关

method_t是对方法/函数的封装，一个方法对象就是一个method_t。

SEL name 是方法/函数的名称
const char *types 是方法签名（返回值类型、参数类型）
MethodListIMP imp 是指向函数实现的指针

struct method_t {
    SEL name;
    const char *types;
    MethodListIMP imp;

    struct SortBySELAddress :
        public std::binary_function<const method_t&,
                                    const method_t&, bool>
    {
        bool operator() (const method_t& lhs,
                         const method_t& rhs)
        { return lhs.name < rhs.name; }
    };
};