I glanced through the paper and didn't see a comparison to TinyLfu. Unfortunately, it seems to be a misleading name which would make one think it was simply an LFU algorithm. The description in caffeine seems to indicate that it's more of an LRU algorithm with an LFU admission heuristic.
If the overhead of the cache itself is significant, as in kernel virtual memory management. A clock based algorithm is usually preferable which uses an array to approximate node based structures. ARC and LIRS have CAR and CLOCK-Pro respectively to simulate them. CAR is vulnerable to re-use distances near the boundary of the cache size. [1] CLOCK-Pro is widely used, notably in Linux VM.
[1] https://www.slideshare.net/huliang64/clockpro
Naming aside, the structure is <LRU | LFU filter | LRU>. The latest version of that is adaptive, where the front LRU may grow or shrink based on the workload. This allows the cache to mimic LRU for recency-skewed workloads (e.g. job queues) and LFU for frequency-skewed ones (e.g. loops). In a stress test, ARC and LIRS had 20% hit rate, Caffeine has 39.6%, and Optimal is 40.3%.
The LHD paper lacks a robust analysis against different workloads. The few traces they use are either private or not considered representative workloads. For MSR, the provider (SNIA) specifically states:
"WARNING: These traces are over 10 years old! They should not be used for modern research!".
It would be much more interesting if they used the traces from ARC and LIRS, perhaps also UMass and Wikipedia. These traces are publically available and used by multiple papers from a variety of groups.
I have not seen a sampling-based policy that outperforms a policy that has tracks history outside of its working set. Since the examples specifically avoided that comparison, I suspect that it under performs in comparison. That isn't to say it isn't a useful advancement, but that a more robust analysis is needed to validate their claims.