Facebook has its own BLOB Storage infrastructure that is responsible for storing trillions of BLOBs. For the Storage Infrastructure these objects are just random bytes of different sizes, which have to be stored reliably. At the user level, however, they could be the representation of a picture or video that they have uploaded to Facebook.
Each picture or video that gets uploaded by a user goes through several processing pipelines, which can produce multiple different logically equivalent representations of the same entity. e.g. a video might be transcoded into multiple encodings with different resolutions/codecs. Although these encodings are different blobs, they represent the same video and any of these can be served to users, based on various parameters like network speed, available codecs and so on.
In that context, the blobs could potentially be related. These different blobs representing the same logical object are called ‘Semantic Replicas’. Given that any of the Semantic Replicas can be served to the user, we can improve the availability & read reliability of the logical asset by storing different Semantic Replicas in different failure domains.
In case of videos, not all videos are equal. Some videos get watched more than others and the probability of a video being watched reduces significantly over time. This information can be used to store videos more efficiently and reduce the storage cost per byte while maintaining the user experience.
Chidambaram and Dan will talk about end-2-end architecture of the systems being built to achieve optimizations across Facebook stack that enhance user experience and lower the storage cost per byte.
Read more in Chidambaram and Dan's blog post, Optimizing Video Storage via Semantic Replication at https://atscaleconference.com/optimiz...