Ring: Advanced cache interface for Python

suvelx · on May 28, 2019

Every example seems to follow this pattern

  client = pymemcache.client.Client(('127.0.0.1', 11211))  #2 create a client

  # save to memcache client, expire in 60 seconds.
  @ring.memcache(client, expire=60)  #3 lru -> memcache
  def get_url(url):
      return requests.get(url).content

How are you supposed to configure the client at 'runtime' instead of 'compile time' (when the code is executed and not when it's imported)?

Careful placement of imports in order to correctly configure something just introduces delicate pain points. It'll work now, but an absent minded import somewhere else later can easily lead to hours of debugging.

sametmax · on May 28, 2019

   @ring.memcache(client, expire=60)   
   def get_url(url):
       return requests.get(url).content

can be written:

    def get_url(url):
        return requests.get(url).content

    get_url = ring.memcache(client, expire=60)(get_url)

Decorators are just syntactic sugar for that pattern.

You are then welcome to instantiate your ring.memcache object and bind it were it pleases you.

I would have provided a different API though:

   cache = pymemcache.client.Client(('127.0.0.1', 11211))

   @cache.lru(expire=60) # wrapper of ring.cache(client)
   def get_url(url):
       return requests.get(url).content

And accepted the alternative:

   cache = pymemcache.client.Client(conf_factory)

   def get_url(url):
       return requests.get(url).content

   get_url = cache.wraps.lru(get_url, expire=60)

It's better to not expect all people to know about the details of decorators just to use your API, and a factory is a nice hook to have anyway: it say where the code for that dynamic configuration should be and code as documentation is the best doc.

Also a patch() context manager would be nice for temporary caching:

   with cache.patch('module.lru', expire=60):
        get_url()

But it's hard to do in a thread safe way to compromised would have to be made.

_pgmf · on May 28, 2019

Although this is true, it is terrible from a developer UX perspective.

Yes, you can "dynamically"-decorate your functions at run-time using whatever global conditionals.

Yes, you can re-decorate the ring decorators.

But you shouldn't have to.

This design is guilty of the cardinal sin of being un-pythonic.

sametmax · on May 28, 2019

That's what I said. Read again.

vngzs · on May 28, 2019

You can use a closure to pass in the configuration.

    def configure_memcache(client_ip, port):
        client = pymemcache.client.Client((client_ip, port))
        @ring.memcache(client, expire=60)
        def get_url(url):
            return requests.get(url).content

        return get_url

Then in your code which imports the above library:

    get_url = configure_memcache('127.0.0.1', 11211)
    result = get_url('https://www.google.com')

_pgmf · on May 28, 2019

I'd rather have a sane API.

    def configure_ring():
        if DEBUG:
            return Ring(backend='debug')
        else:
            return Ring(backend='memcache', ...)

    ring = configure_ring()
    
    @ring.cache(expire=60)
    def get_url(...):
        ...

Tons of other libraries out there that implement this exact pattern.

youknowone · on May 28, 2019

agree

sametmax · on May 28, 2019

This assumes you define get_url().

youknowone · on May 28, 2019

This is a good point. asyncio backends now partially take an initializer function because calling await at importing time is a kind of non-sense.

I think it needs to take also a client-configuration or a client initializer. Any advice from your use case?

PaulHoule · on May 28, 2019

I have been thinking about setup and teardown for asyncio apps in Python lately.

The async with block is a nice idea but doesn't deal with the reality that often a resource has multiple consumers. For instance, there might be several components of an application that use a database connection -- I really want to make the connection once and tear it down only after all of the clients of that connection have themselves been torn down.

What I'm imagining the answer to be is something a little bit like the Spring Framework but fundamentally centered around asyncio.

suvelx · on May 28, 2019

I think there's two common situations that a 'compile time' configuration would not support.

- Loading configuration from `main()` e.g. a configuration in via sys.argv and processed by argparse. - Setting configuration within tests. Unless explicitly told otherwise, I'd expect all tests to be performed against an empty cache. Not to mention, there's no guarantee that I'll have access to a server use during tests.

zzzeek · on May 28, 2019

dogpile.cache author here.

The way dogpile does this is that your decorator is configured in terms of a cache region object, which you configure with backend and options at runtime.

https://dogpilecache.sqlalchemy.org/en/latest/usage.html#reg...

I got this general architectural concept from my Java days, observing what EHCache did (that's where the word "region" comes from).

orf · on May 28, 2019

Surely it's just:

   client = pymemcache.client.Client(('127.0.0.1', 11211))
   cache_wrapper = ring.memcache if some_condition else ring.whatever

   @cache_wrapper(...)
   def ...

_pgmf · on May 28, 2019

Extremely poor design:

* Not DRY. What if I want to use a cache for production but disable caching in development? And I have 10s or even 100s of functions that rely on the cache? Because the decorators contain implementation/client-specific parameters, I now have to add another entire layer of abstraction over this.

* Implementation is tied to the decorator, e.g. `ring.memcache` -- seriously? Why does it matter?

* What about setting application defaults, such as an encoding scheme, a key prefix/namespace, a default timeout?

I'm sorry but this is over-engineered garbage and good luck to anyone who uses it.

youknowone · on May 28, 2019

I agree they are missing features but still they are easy goal with small refactoring so it will be solved soon. Issue #129 is about application default. After that, dryrun is just replacing default action from 'get_or_update' to 'execute'

whalesalad · on May 28, 2019

> I'm sorry but this is over-engineered garbage and good luck to anyone who uses it.

I agree and wish people would speak up and share sentiment like this more often.

tyingq · on May 28, 2019

Is there a python equivalent to php's apcu? Apcu, in the PHP world, leverages mmap to provide a multi-process kv store, with fast, built in serialization. So it's simple and very fast for single server, multi-process caching.

bpicolo · on May 28, 2019

It's not necessary in python (or many other server frameworks), because python doesn't typically follow a model of process-per-request. You can just stick it in memory available to all of your threads.

tyingq · on May 28, 2019

I imagine there are python users doing multi-process, aren't there? (since the GIL is limiting for some use cases).

bpicolo · on May 28, 2019

Yeah - if necessary you can use something like uWSGIs caching interface (https://uwsgi-docs.readthedocs.io/en/latest/Caching.html). For most of these sorts of things you'll typically be fine caching the object in question once per process though, because the processes are still persistent. If you truly need shared memory between processes (rather than just an in-memory version of an object used for a lot of requests) there's other options, and it's infrequent that you need something that's shared between request processes but is not shared between other web servers.

hangonhn · on May 28, 2019

They do have the multiprocessing library and you can use that to do some primitive sort of caching between processes but it's not ideal. The pattern I've used before is to use something like Redis to do the caching and then used multiprocessing to create a coordinator process to coordinate between all the worker processes. Multiprocessing is limited in what sort of things you can pass between processes but you have just enough primitives to do some coarse grain coordination.

kristoff_it · on May 28, 2019

Great project. There is only one angle that I feel is missing: multiple requests for the same resource could cause duplicated work, especially if the value generating function is slow.

I wrote a sample solution to that problem, feel free to reach out if you ever consider adding a similar feature, I'd be happy to contribute. (fyi: the current implementation is in Go)

https://github.com/kristoff-it/redis-memolock

youknowone · on May 28, 2019

Actually it is common requests from the users but it wasn't solved yet. I will check the project, thanks!

bsdz · on May 28, 2019

Looks extensive and I'll likely try using the module at some point.

One thing, why not stash all the function methods under a "ring" or "cache" attribute, eg

  @ring.lru()
  def foo()
    ..

  foo.cache.update()
  foo.cache.delete() 
  ..

This might be less likely to clash with any existing function attributes (if you're wrapping a 3rd party function say).

youknowone · on May 28, 2019

Thanks for the great advice. I never thought about this problem.

mrlinx · on May 28, 2019

Like this a lot.

How could only invalidate everything related to a specific client/customer/account?

I wonder how they cascade these invalidations at bigger and more complex systems.

youknowone · on May 28, 2019

It doesn't have any cascading feature for now. @ring.redis_hash can be helpful for certain cases, but it is not a generic solution.

In future, there is a plan for indirect invalidation. It will use another key to decide expiration. Though this is not designed for cascading, but it will probably work for a part of them

ergo14 · on May 28, 2019

The api doesn't seem to be fleshed out compared to dogpile.cache yet.

Normally you don't want to pass cache backend instance to decorators on module level.

TeeWEE · on May 28, 2019

How does this compare to dogpile?

youknowone · on May 28, 2019

I reviewed a few cache libraries, but this is the new one I didn't checked.

Roughly, Ring consists of 6 key features - sub-functions, universal decorator, data coder, asyncio support, consistent and readable key generation and abstract-transparent back-end access. I will check dogpile soon, thanks.

tfaruq · on May 28, 2019

Any blogpost or link to your review? Thanks

youknowone · on May 28, 2019

Because I didn't write one before, I made a new one with the projects I remember: https://github.com/youknowone/ring/tree/feature-table#featur...

Dowwie · on May 28, 2019

no mutex dogpile lock or get_or_create functionality..

youknowone · on May 28, 2019

no dogpile lock but get_or_create exists with different name: get_or_update

mychael · on May 28, 2019

>Cache is a popular concept widely spread on the broad range of computer science but its interface is not well developed yet.

This sentence is grammatically incorrect. Replace "Cache" with Caching".

youknowone · on May 28, 2019

Thanks, I will fix it

alexeiz · on May 28, 2019

I needed something like this that allows access to and manual manipulation of the cache, and I ended up forking functools.lru_cache code. This library definitely fits the bill.

tomnipotent · on May 28, 2019

> Memcached itself is out of the Python world

Don't know why this bothers me so much... but it's actually from Perl. It was born at LiveJournal, a well-known Perl shop.

jteppinette · on May 28, 2019

I actually read this as outside.

youknowone · on May 28, 2019

Will it be correct expression if I fix it to outside? https://github.com/youknowone/ring/pull/133/files

jteppinette · on May 28, 2019

Yeah, I commented on the PR as such.

merlincorey · on May 28, 2019

To me, mocking of the caches for testing is super important and missing.

I searched the article, the linked "Why Ring?", and this page of responses for "mock", but no results.

Maybe it's just me!

youknowone · on May 28, 2019

Thanks. I didn't think adding them to the why page. For now, the actual projects work like:

  if DEBUG:
      ring_cache = functools.partial(ring.dict, {}, default_action='execute')
  else:
      ring_cache = functools.partial(ring.redis, client)

  @ring_cache(...)
  def ...

Which is not very good solution at all. I will fix the design and properly document it. Thanks for suggesting why page and mock section.

Dowwie · on May 28, 2019

no dogpile lock support?

youknowone · on May 28, 2019

I want to say "not yet". It is shame that I didn't know docpile lock.