原创

S3AsyncClient 和 AsyncResponseTransformer 在下载过程中保持背压 OOM 杀死了具有 320 x 16MB Netty DirectByteBuffer 对象的 JVM

温馨提示:
本文最后更新于 2024年04月12日,已超过 48 天没有更新。若文章内的图片失效(无法正常加载),请留言反馈或直接联系我

I have built a typical download API using Spring Reactive stack and AWS Java SDK v2. Basically, there is a controller which calls s3AsyncClient to download

@GetMapping(path="/{filekey}")
Mono<ResponseEntity<Flux<ByteBuffer>>> downloadFile(@PathVariable("filekey") String filekey) {    
    GetObjectRequest request = GetObjectRequest.builder()
      .bucket(s3config.getBucket())
      .key(filekey)
      .build();
    
    return Mono.fromFuture(s3client.getObject(request, AsyncResponseTransformer.toPublisher()))
      .map(response -> {
        checkResult(response.response());
        String filename = getMetadataItem(response.response(),"filename",filekey);            
        return ResponseEntity.ok()
          .header(HttpHeaders.CONTENT_TYPE, response.response().contentType())
          .header(HttpHeaders.CONTENT_LENGTH, Long.toString(response.response().contentLength()))
          .header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"" + filename + "\"")
          .body(Flux.from(response));
      });
}

Javadoc for that AsyncResponseTransformer.toPublisher() publisher interface includes this:

You are responsible for subscribing to this publisher and managing the associated back-pressure. Therefore, this transformer is only recommended for advanced use cases.

Netty is configured to use Direct No Cleaner method, i.e. allocates DirectByteBuffers instead of HeapBuffers, also it uses UNSAFE to allocate the buffers.

-Dio.netty.maxDirectMemory is 2 or 3GB (tested various behavior).

What I am seeing is that from time to time there are OutOfDirectMemory errors and connection dropped. The client gets Premature End of Content Stream.

It seems like S3AsyncClient may outperform the consumers of the data and direct buffers overflow, no matter how much memory I give to netty. JVM stays intact at around 300MB.

I came across this for netty: OOM killed JVM with 320 x 16MB Netty DirectByteBuffer objects

You cannot control the amount of memory, if not causing OOM as you have done. Netty pooling won't behave like the Java GC vs heap ie increasing some throttling/frequency of its work in order to use resources within specified limits (throwing OOM just under specific circumstances). Netty memory pooling is built to mimic the behaviour of a native allocator eg jemalloc, hence its purpose is to retain as much memory as the application need to work. For this reason, the retained direct memory depends by the allocation pressure that the application code perform ie how many outstanding alloc without release.

I suggest, instead, to embrace its nature, prepare an interesting test load on a preprod/test machine and just monitor the Netty direct memory usage of the application you're interested in. I suppose you've configured -Dio.netty.maxDirectMemory=0 for the purpose of using JMX to expose the direct memory used, but Netty can expose it's own metrics as well (saving setting io.netty.maxDirectMemory), just check that the libraries that use it take care of exposing through JMX or using whatever metrics framework. If these applications won't expose it, the API is fairly easy to be used, see https://netty.io/4.1/api/io/netty/buffer/PooledByteBufAllocatorMetric.html

I am using netty 4.1.89 or 4.1.108 (tried to update) AWS SDK v2 2.23.21 And AWS CRT client 0.29.14 (latest)

I tried doing Flux.from(response).limit(1) with no luck.

My performance test is to download 500MB files in parallel with up to 40 users. The node has 8GB of mem total and 1 CPU unit.

I can understand that this is not enough to handle all users, but was expecting that it will backpressure automatically and keep streaming files just slower, i.e. get next buffer from S3 -> write next buffer to user1, get next buffer from s3 -> write to user2, etc.

However, even when I am using just 1 slow consumer, I see that Netty reports direct memory consumption up to 500MB and if I stop it drops to 16MB (default PoolArena cache I suppose). So, it sounds like S3 Async Client pushes all 500MB into netty's direct buffers and the client slowly drains these.

Trying to limit AWS CRT throughput: targetThroughputInGbps(0.1) didn't help.

I have a feeling that S3AsyncClient+CRT+spring boot netty doesn't automatically handle backpressure. https://github.com/netty/netty/issues/13751

As I can't control the download speed from the client side (might be slow or fast connection), how can I maintain back-pressure to keep direct buffers at a certain limit? Is it possible at all?

正文到此结束
热门推荐
本文目录