Notes from porting to std::string_view

Why port?

Why do it when a const std::string& copies just one reference which is cheap anyway? String_view offers some benefits whilst having little downsides. For example, when you change the reference to a string into string_view, call sites don't need to create std::string out of string literals anymore just so they can call the function. String_view can work with std::strings, but is not limited to them. Creating std::string just so you can call a function is wasteful.

Often it's hard to judge how many strings initially came from char* or some other non-std string just by looking at a particular function in isolation. Perhaps the real string originates from somewhere 10 levels up the stack. It would be difficult to substantiate a claim that converting your code into using std::string_view will not matter for your application.

Converting also raises the question of string ownership in sometimes surprising places where it was only implied before. I consider that a good thing.

Let's start this off optimistically with where the conversion is easy.

Where it is easy

String_view can be viewed as a reference to a string that is allocated somewhere else and which does not hold ownership over the string and cannot change it. Since this is pretty much a description of const std::string& you can almost go through the whole project and inspect all functions that receive const std::string&. These are the first candidates to change into std::string_view. Careful with ones returning const std::string& though. More on that later.

A constructor has been added to std::string that receives std::string_view, so a copy will get created just-in-time and no further work is immediately needed. String_views are not, however, implicitly convertible to std::strings.

This compatibility between std::string_views and std::strings enabled me to port to std::string_view in nice, controlled, piece by piece fashion. It doesn't need to be a big expensive project where it's all or nothing.

Every time c_str() was used on std::string you call data() on std::string_view. It is completely analogous. You don't even need brain for these changes.

Types char* and char[...] are implicitly convertible to string_view without a problem. If a function was written with char* in the arguments to collect strings from C-style libraries, dirent.h header for example, it will work. Such a function will probably be written with heavy C influence and you can keep all that as long as you keep calling .data() on the newly acquired std::string_view. This might be an opportunity to modernize some of that, depending how well the functionality is covered by unit tests. Or alternatively, use the opportunity to back-fill unit tests on top.

Now for some tricky bits and gotchas.

Returning std::string_views from functions

String_views don't own the string. This is something that came as a side effect of working with non-referenced std::strings but now needs consideration when porting away from them. Storage intentions are hidden in surprising places sometimes. For example, when one of the parameters of a function is a string_view which is then returned back to the caller, and the function is invoked with something that will go out of scope. In this case the string content is destroyed and the string_view you stored is dangling. Before the change to string_view it worked either because const reference itself kept the object alive or because a copy has been created somewhere. This changes with transition to string_view. As long as you don't change the return type from std::string to std::string_view you should be fine, because returning std::string creates a copy.

Many times however it does make sense to convert return value of a function to string_view. In these cases you shouldn't just mindlessly update the call sites as well. You need to make sure the function was never used to retrieve a string with an intention of storing it for longer, because until the conversion, it was unclear what the intention around ownership was. Compiler can be leveraged to spot these places and consider them one by one. The "almost always auto" guideline puts you at a disadvantage here. If you followed AAA rule faithfully in the past, compiler will report errors at the point of storing it into target string type instead of immediately at the function call site which I think is slightly more useful.

Passing std::string_views to functions

If you change to string_view an argument of the virtual function that overrides base class' virtual, it won't override anymore. If it was not pure virtual it will compile OK, but not work correctly. Be careful about porting functions in class hierarchy to string_view. Use overrides keyword to protect yourself from these and similar cases.

Some implementations of functions can be changed further after parameters were changed from string to string_view. For example, consider a function that splits the string by some logic and returns a collection of substrings. If such function created a bunch of std::string objects, each being a copy of a section of the original string, this can be refactored into using string_views. Create a bunch of string_views instead, all pointing at starts and stops of the original copy, and return a collection of those. At least memory savings will be obvious as you won't need additional O(n) new space.

There is no point in receiving const string_view as all members are const anyway.

Passing std::string_views through functions

The easiest case is when you can afford to receive string_view as an argument and only do operations on it that don't require creating std::string. Then you can return std::string_view too.

If you have any functions where the exact type of string is templatised, the effect of the code might not be the same anymore. If you're lucky compilation will fail, but when it does not it does not mean the conversion was automatically safe. For example, assignment operator will work, but will not create a copy anymore. If you used that copy to modify the string and return the modified one you are in the lucky case because you can't modify string_view and it will fail compilation. If you used that copy to for example save the state of the string at that time, the compilation will succeed but it will produce a bug.

When the purpose of the function is to create std::string out of the received string object it might seem as is there was no point in converting the function to receive std::string_view. I still found it worthwhile to do so. It offers slightly more versatility, even if it is just towards it's own kind. What I mean by that is that some of your functions will start returning std::string_view after the conversion because it made sense for them in isolation. Imagine now if a result of such a function is to be used as an input to another. If the called function receives a string_view you can simply forward it immediately. If not, you need to create std::string object just to call the function with it, because string_view does not implicitly convert to std::string. The latest you can afford to convert to std::string the better, it seems. You lose implicit convertibility from non-standard strings this way, but if you are using them you probably have the converter available already.

On the flip side of this when you already have an std::string object and you want to return it, you might as well return it as std::string. If the caller really needs std::string then it's a natural. If they can do with a std::string_view, it implicitly converts for a negligent cost, assuming RVO. If you returned as std::string_view, however, the caller could not directly call functions that receive std::string without explicit conversion and you cause them trouble without offering any benefits. For example, I implemented getter and setter for a member of type std::string so that getter returns std::string and setter receives std::string_view.

Incompatibility with non-standard strings

It's easy to pass std::string_view into functions that receive Glib::ustring, for example. You can just call string_view::data() and pass const char* because Glib::ustring happens to have a constructor that will construct from const char*. MFC's CString and QT's QString do too, probably most string implementations do, so this is not the real problem in practice.

The real problem with non-standard strings in many popular frameworks is when some of their functions inevitably return their type of string. In that case you'll get the error converting from Glib::ustring to std::string_view. Before this often wasn't a problem because, if we keep Glib as an example, Glib::ustring provided it's own converter to std::string and you never had to notice it. Converting to std::string_view, you do. Until the libraries themselves succeed in keeping up with the times you will need to provide the converter yourself. Sounds more trivial than it is, because often what you get, you were meant to copy, but that doesn't happen with string_view.

Take for example libxml++ library. If you call get_content() on ContentNode you will get a copy of a Glib::ustring. The correct way to convert Glib::ustring to string_view seems to be calling c_str() on it, which will return you a char*, but that will be a char* of a temporary string. Taking get_content() and assigning it to std::string might have worked before, because assignment created a copy, but assigning it to std::string_view directly will fail in the worst possible way: at run-time and only sometimes. Good luck finding that one.

Incompatibility with std facilities

For example, std::fstream has a constructor that receives const char* but not one that would be willing to consider std::string_view. Similar with std::istringstream. String_view doesn't automatically convert to char* like std::string, so you'll need to call string_view::data() yourself. I can't think of a single downside to this, as I am not one that considers an hour spent retyping limited parts of code something to avoid at all costs like many. However, I can't shake the feeling that this is an overlooked feature. A constructor of std::fstream that takes for an argument a std::string_view would have been backward compatible, so why wasn't that done I wonder?

std::stod demands std::string reference to convert it to double, string_view won't do. I understand it is just a wrapper around std::strtod, where you provide char*, but such a wrapper could be made available for string_views too.

If you use unordered_set<std::string>, find member function will not be satisfied with receiving string_views even if that would make sense. It will insist on wanting const std::string& because it just applies the template parameter to it and it would be a stretch to write a specialised unordered_set for std::strings just so that it could receive string_view.

Member functions of string_view seem to have different philosophy than the rest of the std library when it comes to accessing elements in the collection. In string_view::find_first_of() you have to use positions, while std::find_first_of algorithm operates on iterators. This is OK, you can just use algorithm function instead of a member. On the other hand, string_view::substr() is very appealing because there is no simple counterpart in algorithms library, yet using it is cumbersome. You need to calculate std::distance, or use member function find_first_of. All seems clumsy.

Understandably, string_view does not have operators similar to operator+ implemented. Whenever you might have called something like std::string object + "literal", will now require temporary std::string first to create a copy and establish ownership. Only then will you be able to make modifications to it. If the whole point of the function was for example to append some endings to strings, you might as well not bother converting it. If the argument is std::string (not reference!) and you call it with char* it will instantiate anyway. If you make modifications on that copy instead of creating your own first, nothing was really lost.

There is no constructor that creates std::string_view from two iterators to std::string. There is such a constructor for std::string.

Cheatsheet


void receives_sv(std::string_view sv) { return; };
void receives_str(std::string str) { return; };
std::string_view returns_sv() { return "sv"; };
std::string returns_str() { return "str"; };

Glib::ustring ustr = "ustr";
std::string str = "str";
std::string_view sv = "sv";
char* cp = "literal";

// between friends
str = sv; // ok
sv = str; // ok
std::string str2(sv); // ok
std::string_view sv2(str); // ok
cp = str; // error
cp = sv; // error
str = cp; // ok
sv = cp; // ok
receives_sv(cp); // ok
receives_str(cp); // ok

// non-standard strings
str = ustr; // ok, conversion provided by Glib::ustring
sv = ustr; // error: could not convert
sv = ustr.data(); // ok, .data() returns char*
ustr = sv; // error: could not convert
ustr = sv.data(); // ok, .data() returns char*
ustr = str; // ok, conversion provided by Glib::ustring
receives_sv(ustr); // error: could not convert
receives_str(ustr); // ok
ustr = returns_str(); // ok
ustr = returns_sv(); // error: could not convert

// implicit conversions to std::string_view
receives_sv(str); // ok
receives_sv(sv); // ok
receives_sv("literal"); // ok
receives_sv(cp); // ok
receives_sv(ustr); // error: could not convert
receives_sv(ustr.data()); // ok, .data() returns char*

// implicit conversions to std::string
receives_str(str); // ok
receives_str(sv); // error: could not convert
receives_str(sv.data()); // ok, .data() returns char*
receives_str("literal"); // ok
receives_str(cp); // ok
receives_str(ustr); // error: could not convert


Previous: Retrieving iterator value type
Next: Map vs unordered_map performance