Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ring: Advanced cache interface for Python (ring-cache.readthedocs.io)
174 points by youknowone on May 28, 2019 | hide | past | favorite | 45 comments


Every example seems to follow this pattern

  client = pymemcache.client.Client(('127.0.0.1', 11211))  #2 create a client

  # save to memcache client, expire in 60 seconds.
  @ring.memcache(client, expire=60)  #3 lru -> memcache
  def get_url(url):
      return requests.get(url).content

How are you supposed to configure the client at 'runtime' instead of 'compile time' (when the code is executed and not when it's imported)?

Careful placement of imports in order to correctly configure something just introduces delicate pain points. It'll work now, but an absent minded import somewhere else later can easily lead to hours of debugging.


   @ring.memcache(client, expire=60)   
   def get_url(url):
       return requests.get(url).content
can be written:

    def get_url(url):
        return requests.get(url).content

    get_url = ring.memcache(client, expire=60)(get_url)
Decorators are just syntactic sugar for that pattern.

You are then welcome to instantiate your ring.memcache object and bind it were it pleases you.

I would have provided a different API though:

   cache = pymemcache.client.Client(('127.0.0.1', 11211))

   @cache.lru(expire=60) # wrapper of ring.cache(client)
   def get_url(url):
       return requests.get(url).content
And accepted the alternative:

   cache = pymemcache.client.Client(conf_factory)

   def get_url(url):
       return requests.get(url).content

   get_url = cache.wraps.lru(get_url, expire=60)
  
It's better to not expect all people to know about the details of decorators just to use your API, and a factory is a nice hook to have anyway: it say where the code for that dynamic configuration should be and code as documentation is the best doc.

Also a patch() context manager would be nice for temporary caching:

   with cache.patch('module.lru', expire=60):
        get_url()
But it's hard to do in a thread safe way to compromised would have to be made.


Although this is true, it is terrible from a developer UX perspective.

Yes, you can "dynamically"-decorate your functions at run-time using whatever global conditionals.

Yes, you can re-decorate the ring decorators.

But you shouldn't have to.

This design is guilty of the cardinal sin of being un-pythonic.


That's what I said. Read again.


You can use a closure to pass in the configuration.

    def configure_memcache(client_ip, port):
        client = pymemcache.client.Client((client_ip, port))
        @ring.memcache(client, expire=60)
        def get_url(url):
            return requests.get(url).content

        return get_url
Then in your code which imports the above library:

    get_url = configure_memcache('127.0.0.1', 11211)
    result = get_url('https://www.google.com')


I'd rather have a sane API.

    def configure_ring():
        if DEBUG:
            return Ring(backend='debug')
        else:
            return Ring(backend='memcache', ...)

    ring = configure_ring()
    
    @ring.cache(expire=60)
    def get_url(...):
        ...
Tons of other libraries out there that implement this exact pattern.


agree


This assumes you define get_url().


This is a good point. asyncio backends now partially take an initializer function because calling await at importing time is a kind of non-sense.

I think it needs to take also a client-configuration or a client initializer. Any advice from your use case?


I have been thinking about setup and teardown for asyncio apps in Python lately.

The async with block is a nice idea but doesn't deal with the reality that often a resource has multiple consumers. For instance, there might be several components of an application that use a database connection -- I really want to make the connection once and tear it down only after all of the clients of that connection have themselves been torn down.

What I'm imagining the answer to be is something a little bit like the Spring Framework but fundamentally centered around asyncio.


I think there's two common situations that a 'compile time' configuration would not support.

- Loading configuration from `main()` e.g. a configuration in via sys.argv and processed by argparse. - Setting configuration within tests. Unless explicitly told otherwise, I'd expect all tests to be performed against an empty cache. Not to mention, there's no guarantee that I'll have access to a server use during tests.


dogpile.cache author here.

The way dogpile does this is that your decorator is configured in terms of a cache region object, which you configure with backend and options at runtime.

https://dogpilecache.sqlalchemy.org/en/latest/usage.html#reg...

I got this general architectural concept from my Java days, observing what EHCache did (that's where the word "region" comes from).


Surely it's just:

   client = pymemcache.client.Client(('127.0.0.1', 11211))
   cache_wrapper = ring.memcache if some_condition else ring.whatever

   @cache_wrapper(...)
   def ...


Extremely poor design:

* Not DRY. What if I want to use a cache for production but disable caching in development? And I have 10s or even 100s of functions that rely on the cache? Because the decorators contain implementation/client-specific parameters, I now have to add another entire layer of abstraction over this.

* Implementation is tied to the decorator, e.g. `ring.memcache` -- seriously? Why does it matter?

* What about setting application defaults, such as an encoding scheme, a key prefix/namespace, a default timeout?

I'm sorry but this is over-engineered garbage and good luck to anyone who uses it.


I agree they are missing features but still they are easy goal with small refactoring so it will be solved soon. Issue #129 is about application default. After that, dryrun is just replacing default action from 'get_or_update' to 'execute'


> I'm sorry but this is over-engineered garbage and good luck to anyone who uses it.

I agree and wish people would speak up and share sentiment like this more often.


Is there a python equivalent to php's apcu? Apcu, in the PHP world, leverages mmap to provide a multi-process kv store, with fast, built in serialization. So it's simple and very fast for single server, multi-process caching.


It's not necessary in python (or many other server frameworks), because python doesn't typically follow a model of process-per-request. You can just stick it in memory available to all of your threads.


I imagine there are python users doing multi-process, aren't there? (since the GIL is limiting for some use cases).


Yeah - if necessary you can use something like uWSGIs caching interface (https://uwsgi-docs.readthedocs.io/en/latest/Caching.html). For most of these sorts of things you'll typically be fine caching the object in question once per process though, because the processes are still persistent. If you truly need shared memory between processes (rather than just an in-memory version of an object used for a lot of requests) there's other options, and it's infrequent that you need something that's shared between request processes but is not shared between other web servers.


They do have the multiprocessing library and you can use that to do some primitive sort of caching between processes but it's not ideal. The pattern I've used before is to use something like Redis to do the caching and then used multiprocessing to create a coordinator process to coordinate between all the worker processes. Multiprocessing is limited in what sort of things you can pass between processes but you have just enough primitives to do some coarse grain coordination.


Great project. There is only one angle that I feel is missing: multiple requests for the same resource could cause duplicated work, especially if the value generating function is slow.

I wrote a sample solution to that problem, feel free to reach out if you ever consider adding a similar feature, I'd be happy to contribute. (fyi: the current implementation is in Go)

https://github.com/kristoff-it/redis-memolock


Actually it is common requests from the users but it wasn't solved yet. I will check the project, thanks!


Looks extensive and I'll likely try using the module at some point.

One thing, why not stash all the function methods under a "ring" or "cache" attribute, eg

  @ring.lru()
  def foo()
    ..

  foo.cache.update()
  foo.cache.delete() 
  ..
This might be less likely to clash with any existing function attributes (if you're wrapping a 3rd party function say).


Thanks for the great advice. I never thought about this problem.


Like this a lot.

How could only invalidate everything related to a specific client/customer/account?

I wonder how they cascade these invalidations at bigger and more complex systems.


It doesn't have any cascading feature for now. @ring.redis_hash can be helpful for certain cases, but it is not a generic solution.

In future, there is a plan for indirect invalidation. It will use another key to decide expiration. Though this is not designed for cascading, but it will probably work for a part of them


The api doesn't seem to be fleshed out compared to dogpile.cache yet.

Normally you don't want to pass cache backend instance to decorators on module level.


How does this compare to dogpile?


I reviewed a few cache libraries, but this is the new one I didn't checked.

Roughly, Ring consists of 6 key features - sub-functions, universal decorator, data coder, asyncio support, consistent and readable key generation and abstract-transparent back-end access. I will check dogpile soon, thanks.


Any blogpost or link to your review? Thanks


Because I didn't write one before, I made a new one with the projects I remember: https://github.com/youknowone/ring/tree/feature-table#featur...


no mutex dogpile lock or get_or_create functionality..


no dogpile lock but get_or_create exists with different name: get_or_update


>Cache is a popular concept widely spread on the broad range of computer science but its interface is not well developed yet.

This sentence is grammatically incorrect. Replace "Cache" with Caching".


Thanks, I will fix it


I needed something like this that allows access to and manual manipulation of the cache, and I ended up forking functools.lru_cache code. This library definitely fits the bill.


> Memcached itself is out of the Python world

Don't know why this bothers me so much... but it's actually from Perl. It was born at LiveJournal, a well-known Perl shop.


I actually read this as outside.


Will it be correct expression if I fix it to outside? https://github.com/youknowone/ring/pull/133/files


Yeah, I commented on the PR as such.


To me, mocking of the caches for testing is super important and missing.

I searched the article, the linked "Why Ring?", and this page of responses for "mock", but no results.

Maybe it's just me!


Thanks. I didn't think adding them to the why page. For now, the actual projects work like:

  if DEBUG:
      ring_cache = functools.partial(ring.dict, {}, default_action='execute')
  else:
      ring_cache = functools.partial(ring.redis, client)

  @ring_cache(...)
  def ...

Which is not very good solution at all. I will fix the design and properly document it. Thanks for suggesting why page and mock section.


no dogpile lock support?


I want to say "not yet". It is shame that I didn't know docpile lock.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: