Audio and video will not replace text, IMO. You can consume audio and video only at the pace that it was originally created (OK, you can speed it up or slow it down a little bit but not by much, without it sounding like either Mickey Mouse or Lurch...)
I know that this topic must come up a lot when discussing the trend of communication.
But I hope you do not mind if I explore it a little bit just to help me with my own understand of the topic. Also I hope you do not mind my stilted prose.
What one fellow related to me was that Audio/video would never replace text because of the degree of separation between the content and the consumer. With text, it is no more than 2 to three degrees of separation however with audio/visual the degree of separation can be a factor of say a billion. Not sure about the math involved here but I did get the point text on a stone is easier to get at then text in a digital format. Not as dynamic but hay it worked with out Electricity and the Gear (think about it).
However, what I was thinking was what if audio and video becomes like glyphs on a page. I am not really talking about letters strung together to represent a sound that represents an idea that creates a feeling. However, more about a picture gram, that conveys a whole idea or emotion that then creates a feeling that leads to a sound.
With audio/video, you can use so many parameters for conveying an idea. Height, width, depth, frequency, tone, color, contrast, sharpness. In text, all of this has to be described. With audiovisual all of these parameters do not need to be described but in fact could be used to convey the idea or feeling in a blink of an eye.
Well just to keep it short could not audio/video become the new text of humanity. I mean we have moved from picture grams to alphabets to grammar structure, So why not the next step. Flash yellow, flash blue blue, High tone, Flash red, low tone, kaleidoscope of billions of pictures, there you go the entire world history beamed into your head.
Well if the Chinese can take ten strokes turn them into 120 radicals and have no more then 5 transforms for each radical why not a language based on light and sound.