Summarization as Compression in Search Engine Structure
Serving a big textual content search index in manufacturing could be a expensive factor. These prices develop proportionally as the dimensions of the dataset being listed grows and will get dearer when the queries per second develop. Due to the best way search works, conventional disk compression isn’t an excellent choice, leaving architects in a scenario the place they have to stability value and latency.
Retaining index sizes as small as attainable gives efficiency value, and environmental advantages. Smaller indexes might be searched extra shortly by the identical {hardware} and may typically be held in reminiscence as a substitute of on disk. An efficient compression algorithm for search may drastically cut back the {hardware} necessities for a lot of search functions.
Enter summarization as compression. The appearance of AI/ML and fashions reminiscent of Bert, Titan and Llama enable for the fast and reasonably priced summarization of enormous textual content information units, thereby shrinking the dimensions of the index at the price of a lack of precision. Including a compression step in a search engine indexing pipeline can dramatically alter the efficiency of a search software.
How does it work? Every doc is submitted to a summarization mannequin, and the abstract is saved within the index fairly than the unique textual content. Queries might be quicker however much less exact. Think about a textual content index of Wikipedia. Compression as summarization may dramatically cut back the index dimension whereas preserving the flexibility to shortly direct folks to articles.
What if I want exact outcomes? Retaining an index of the unique textual content would possibly nonetheless be needed used alongside compressed indexes. The unique textual content could possibly be supplied as an “develop your search” choice. Compressed indexes used as a triage step catch a lot of the queries making a low QPS authentic textual content index possible.
What about decompression? Retrieving the unique textual content from the database or s3 bucket the place it’s saved would successfully be decompression.
This can be a very new idea made attainable by the low value of summarization with ML fashions. I’ve achieved some preliminary experiments with it and the outcomes appear to be promising, and I’m in search of an opportunity to make use of this system in manufacturing.