Because visual representation is separate from the underlying data structure. A string container doesn't have a specific direction, only a relative one. I.e. This character comes before the next and after the previous. Adding the bidi control code, the string indicates when the visual ordering changes in this relative direction system.
You could absolutely design a new string container that assumes left to right at all times and cannot be changed, but then it's on the programmer to ensure that strings are copied or concatenated in the right direction, at the right location, and substrings searching becomes a minor headache. How would you concatenate an RTL string to a forced LTR string representation? You would have to work out whether the end of the string it LTR or RTL. If LTR, append directly. If RTL find the character where the direction changes and insert the string in there - much more expensive. Better to just append the string, using bidi codes where required, and let the frontend process the string to make the appropriate direction changes. Yes, you may need to search the string for the bidi code to know which direction you're going at the end of the string, but that's just a simple reverse string search for a single control character, and not a complex variable multi-byte search of inferred character directions by codepoint values.
I think the issue is in the locations of which bidi codes are rendered. They provide an inherent untrustworthy-ness to the text area they're rendered in, and so should be treated as an exception in critical situations. I've seen the reversed exe file name trick used for years, and every time I ask myself why that's even a thing? If the OS used file headers and magic numbers to determine file types instead of the filename, it would be less of an issue.
For source code, I would question the rendering of RTL text in a source code editor as it's an obvious issue for code safety. Ideally, all source code would be kept to the same origin language - doesn't have to be english, just consistent. Any non-conforming text should ideally be loaded from a resource rather than inline within the source code, to avoid foreign character contamination and allow easier identification of these issues. Further, source code rendering should only render identified safe control codes, and treat unsafe ones as raw binary values to be shown as such - i.e. \r and \n are safe, \b is unsafe, and bidi codes would also be unsafe. You could even go so far as to include them in the syntax highlighting, but that results in a dependency on syntax highlighting to show the semantics of the source code rather than the text alone.
You could absolutely design a new string container that assumes left to right at all times and cannot be changed, but then it's on the programmer to ensure that strings are copied or concatenated in the right direction, at the right location, and substrings searching becomes a minor headache. How would you concatenate an RTL string to a forced LTR string representation? You would have to work out whether the end of the string it LTR or RTL. If LTR, append directly. If RTL find the character where the direction changes and insert the string in there - much more expensive. Better to just append the string, using bidi codes where required, and let the frontend process the string to make the appropriate direction changes. Yes, you may need to search the string for the bidi code to know which direction you're going at the end of the string, but that's just a simple reverse string search for a single control character, and not a complex variable multi-byte search of inferred character directions by codepoint values.
I think the issue is in the locations of which bidi codes are rendered. They provide an inherent untrustworthy-ness to the text area they're rendered in, and so should be treated as an exception in critical situations. I've seen the reversed exe file name trick used for years, and every time I ask myself why that's even a thing? If the OS used file headers and magic numbers to determine file types instead of the filename, it would be less of an issue.
For source code, I would question the rendering of RTL text in a source code editor as it's an obvious issue for code safety. Ideally, all source code would be kept to the same origin language - doesn't have to be english, just consistent. Any non-conforming text should ideally be loaded from a resource rather than inline within the source code, to avoid foreign character contamination and allow easier identification of these issues. Further, source code rendering should only render identified safe control codes, and treat unsafe ones as raw binary values to be shown as such - i.e. \r and \n are safe, \b is unsafe, and bidi codes would also be unsafe. You could even go so far as to include them in the syntax highlighting, but that results in a dependency on syntax highlighting to show the semantics of the source code rather than the text alone.