The solution in Rust is separate String and &str. &str is a reference to somewhe...

IgorPartola · on July 15, 2024

Allocating headers and strings separately blows your CPU cache. Hardly a performant way of doing hot loops.

saagarjha · on July 15, 2024

Compared to calling strlen a bunch, which I’m sure is significantly more performant.

IgorPartola · on July 15, 2024

You never need to call strlen unless you are getting your inputs from a place that doesn’t give you a string length (such as stdin).

deathanatos · on July 16, 2024

So which is it, then? Does keeping size separate "blows your CPU cache"¹ or not? You can't argue it does in one case (Rust) but not in your case…

(And note that the representation you're responding to is not really a "header", in the same sense that the trailing null is a "footer". The representation does not require the length be contiguous with the data, but that's what upthread was trying to say in the first place.)

¹(it doesn't…)

eru · on July 16, 2024

So now you are arguing that by default your strings should come with a length? Great!

If you want that, you might as well bake that length into the string type by default (and use a specialised type, perhaps a naked raw pointer into the string) for when you don't want to pass the length.

saagarjha · on July 16, 2024

That's most interfaces…?

kevin_thibedeau · on July 16, 2024

Not argv[].

saagarjha · on July 17, 2024

You still need to call strlen on each element?

tialaramex · on July 16, 2024

To get a correct understanding, if you aren't a Rust person, Rust's String is (literally, though this is opaque) Vec<u8> with checks to ensure it's actually not just arbitrary bytes you're storing but UTF-8 text.

Vec<u8> unlike String has a direct equivalent (well, the Rust design is slightly better from decades of extra experience, but same rough shape) in C++ std::vector<std::byte>

The C++ std::string is this very weird thing that results from having standardized std::string in C++ 98, then changing their mind after implementation experience. So while it's serviceable it's pretty poor and nobody should model what they're doing on this. There have been HN links to articles by people like Raymond Chen on what this type looks like now.

pcwalton · on July 15, 2024

In order to access the string contents in the first place you need the pointer. The length is stored right next to it. So they're both going to be in the same cache line, assuming proper alignment. In the rare case in which they straddle a cache line, you just have to load once and then the length remains in cache for the remainder of the loop. (This is true regardless of where the length lives, in fact; as far as CPU cache is concerned it really makes little difference either way.)

(This is assuming SROA didn't break apart the string and put the length in a register, which it often does, making this entire question moot.)

Tuna-Fish · on July 15, 2024

Huh? The headers are either in registers or in stack. The top of stack is always in L1. There is no way in which this is inferior to handing over a pointer to a string and a length separately, other than requiring two additional words of storage in registers/stack.

IgorPartola · on July 15, 2024

How is that? Say you are reading 1000 lines of stdin at once to process them. Which registers are your string and substring headers stored.

Tuna-Fish · on July 15, 2024

If you are reading 1000 lines from stdin at once to separate Strings, you are already going to be accessing memory in 1000 places at the same time, and making it 1001 isn't meaningfully worse for your cache. (Implementation would be Vec<String>, which would lay out the 1000 headers contiguously.)

But I genuinely have a hard time understanding for what kind of workload I would ever do that. If you want to read a 1000 lines of stdin, and cannot use an iterator and must work on them at the same time, I would likely much rather read them into a single string and then split that into a 1000 &str using the .lines() iterator.

dgfitz · on July 15, 2024

I was miffed at: 1000 lines from stdin. It’s the same problem 1000 times, not 1000 problems at once.

Tuna-Fish · on July 15, 2024

Presumably the idea is, for example, sorting? In which case you do have to read the entire input before you can do anything. But the way I'd do that is to read the entire stdin to a single String, then work with &str pointers to it.

pezezin · on July 16, 2024

If you really care about performance, you should not allocate within hot loops.