Delete Duplicate Documents with Elasticsearch and Ruby

Can you remember the good old times of delete_by_query in Elasticsearch? If you want to delete some documents with a complex query or an aggregation you on your own. I needed to delete some documents which have the same date, so I used the following script.

require 'elasticsearch'
client =

  # find duplicate documents by @timestamp
  result =
    index: 'my_index*', 
    body: {
      aggs: {
        duplicateCount: {
          terms: {
            field: "@timestamp",
            "min_doc_count": 2,
            size: 100
          aggs: {
            duplicateDocuments: {
              top_hits: {}
  )['aggregations']['duplicateCount']['buckets'].map do |bucket|
    #use the first document of the duplicates

  result.each do |doc|
    client.delete(index: doc['_index'], type: doc['_type'], id: doc['_id'])
  client.indices.refresh(index: 'my_index*')
end until result.count <= 0

Related Posts