Video containers such as mp4 & mov store video and audio tracks. These tracks are made of frames. These frames are stacked in correlation with time values. These time values represent how long those individual frames should be shown/played.
If capture video at 60 frames per second, there are
1/60 seconds between each video frames
How do we represent precise time values for each video frames?
timescale (tbn): A logical clock and it’s total ticks in a second. Typically, FPS (frames per second) or it’s multiple is set as timescale, Ex: 600 is LCM of 24FPS, 25FPS and 30FPS. Lets take a logical clock with 600 ticks in a second.
timebase: Duration of one tick of the timescale. In above case, if 600 ticks make up a second, duration of the one tick is 1/600 seconds ie, timebase = 1/timescale
pts (Presentation TimeStamp): Represents how many timebase units are associated with the each frame. By having varying pts for frames will result in variable frame rate video.
pts_time: pts (how many timebase units) * timebase
Video containers store frames against pts_time and details on timebase, timescale and pts.