Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory as file descriptor #195

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Memory as file descriptor #195

wants to merge 33 commits into from

Conversation

Wenzel
Copy link
Owner

@Wenzel Wenzel commented Apr 23, 2021

This PR implements the memory as a file descriptor, using Rust standard traits

  • Read
  • Write
  • Seek

A second memory object is implemented to handle "padded" memory reads, useful for Volatility.

Changes

To introduce this memory abstraction, I had to update the API.
The library doesn't return a dyn Introspectable anymore, but a Microvmi struct, which contains the driver and both memory objects (padded and non padded).

On this Microvmi struct, the methods are simply forwarding to the driver implementation.
Regarding the memory objects, I had to use a Rc<Refcell<Box<dyn introspectable>>> to share the driver between all the structs, allowing a mutable borrow checked at runtime.

I tested against master while dumping the memory and the overhead is invisible.

Some unit tests have been implemented for the seek implementation.

I also changed how the drivers should implement reading physical memory.

The new API is:

    fn read_frame(&self, _frame: PageFrame, _buf: &mut [u8]) -> Result<(), IoError> {
        unimplemented!();
    }

Which allows us to remove complexity from the driver implementation to the Read trait implementation (especially for Xen), and share this for all drivers.

@Wenzel Wenzel force-pushed the memory/io_refcell branch from 8639f75 to ba48425 Compare April 28, 2021 11:54
@Wenzel Wenzel force-pushed the memory/io_refcell branch from 4e95742 to 4fad009 Compare April 29, 2021 13:40
@Wenzel Wenzel force-pushed the memory/io_refcell branch 2 times, most recently from 88b48f4 to 59a7ad4 Compare April 29, 2021 13:49
@Wenzel Wenzel force-pushed the memory/io_refcell branch from 59a7ad4 to cd4dc16 Compare April 29, 2021 14:02
@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2021

Codecov Report

Merging #195 (9fef107) into master (acf7b73) will increase coverage by 4.43%.
The diff coverage is 19.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #195      +/-   ##
==========================================
+ Coverage   14.19%   18.62%   +4.43%     
==========================================
  Files           5        7       +2     
  Lines         472      671     +199     
  Branches       88      108      +20     
==========================================
+ Hits           67      125      +58     
- Misses        383      513     +130     
- Partials       22       33      +11     
Flag Coverage Δ
unittests 18.62% <19.82%> (+4.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/api.rs 1.73% <0.00%> (+1.73%) ⬆️
src/capi.rs 0.00% <0.00%> (ø)
src/driver/kvm.rs 26.73% <0.00%> (+4.21%) ⬆️
src/errors.rs 0.00% <ø> (ø)
src/lib.rs 100.00% <ø> (+95.65%) ⬆️
src/microvmi.rs 0.00% <0.00%> (ø)
src/memory.rs 37.50% <37.50%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update acf7b73...9fef107. Read the comment docs.

@Wenzel Wenzel marked this pull request as ready for review April 30, 2021 08:20
@Wenzel
Copy link
Owner Author

Wenzel commented Apr 30, 2021

This PR is ready for review.
@rageagainsthepc , if you want to have a quick look.

@Wenzel Wenzel requested a review from rageagainsthepc April 30, 2021 08:26
@rageagainsthepc
Copy link
Collaborator

@Wenzel Will do 👍

@Wenzel Wenzel mentioned this pull request May 6, 2021
@rageagainsthepc
Copy link
Collaborator

rageagainsthepc commented May 8, 2021

Imho the microvmi struct looks a bit overly complicated. I would prefer a design where you (as a library user) initialize the driver like we used to and then pass it to one of the memory abstraction classes:

let drv = driver_init();
let memory = Memory::new(drv);
let padded_memory = Paddedmemory::new(drv);

Not every user needs memory abstractions, but those who do could easily initialize them.
But it's just a thought, haven't looked at the PR in detail yet. Maybe I'm not seeing the bigger picture.

Edit: Right now this implementation would force the library user to write single threaded applications (because Rc does not implement Send). It would be better if the user could choose between Rc or Arc for example. The idea above would allow this behavior with minimal adjustments to the current implementation I think.
Edit2: If you make your memory abstraction classes generic over the trait bound BorrowMut<Box<dyn Introspectable>> that should do the trick.

@Wenzel
Copy link
Owner Author

Wenzel commented May 17, 2021

Hi @rageagainsthepc and thank you for your feedback.

I should have explained a bit more why I decided this implementation and design when opening the PR.
I would have also have prefered a simpler design, where the library would return a trait object (Box<dyn Introspectable), and provide wrapper on top of this, (Memory::new(drv)).

The issues comes from exposing this API in C and Python. because how do you keep the context ?

The C API exposes a void* drv pointer, to be casted into our box<dyn Introspectable>, however, this wouldn't retain the Memory object context as well, since it's a separate object.

That's the main reason why I choose to go this way, keep a single struct Microvmi to be casted as void* into the C world, back and forth, and able to offer the read/write/seek abstraction underneath.

A second reason is that we are able to simplify our read implementations in the drivers, mainly the Xen driver, since the main algorithm to iterate over each frame is implemented in the layer above.

And another reason is to be able to implement batch read operation later on, hidden in the Memory object implementation.
Memflow has support for this since some of its targets have a very high latency, so have to batch you read operations, otherwise the whole read will become slow and uneffective.

Edit: as a bonus, I just noticed that with the struct Microvmi we can keep the DriverType after initialization, and therefore remove get_driver_type() from the driver's implementation

Copy link
Collaborator

@rageagainsthepc rageagainsthepc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would at least improve the tests. As for the other comments: I understand if you are set on your current approach, I'll leave that up to you.

}

impl PageFrame {
pub fn with_paddr(paddr: u64) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Imho the name is misleading. with_paddr(paddr: u64) only makes sense if this function would transform an existing PageFrame. Since it simply creates a new one out of thin air I would stick to new(paddr: u64).

}
}

pub fn to_paddr(&self) -> u64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
pub fn to_paddr(&self) -> u64 {
pub fn paddr(&self) -> u64 {

Probably just a matter of personal taste but it makes more sense to me.

&mut bytes_read_local,
)
.is_ok();
if driver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was supposed to be a simple C abstraction layer. Augmenting it with code that goes beyond that is bad design imho.

mock_introspectable
.expect_get_max_physical_addr()
.returning(move || Ok(max_addr));
let mut memory = Memory::new(Rc::new(RefCell::new(Box::new(mock_introspectable)))).unwrap();
Copy link
Collaborator

@rageagainsthepc rageagainsthepc May 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would welcome it if we followed the arrange, act, assert pattern as it makes the tests more readable imo. https://automationpanda.com/2020/07/07/arrange-act-assert-a-pattern-for-writing-good-tests/
I don't think that you should add comments like the article does, but simply adding newlines between the three blocks would be helpful.

.returning(move || Ok(max_addr));
let mut memory = Memory::new(Rc::new(RefCell::new(Box::new(mock_introspectable)))).unwrap();
// seek 0 doesn't move position
assert_eq!(0, memory.seek(SeekFrom::Start(0))?);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put multiple unrelated asserts into the same test case, if you want to avoid code duplication use the test-case crate. See https://github.com/Wenzel/libmicrovmi/blob/master/src/driver/kvm.rs#L398

/// # Arguments
///
/// * `domain_name` - The domain name
/// * `driver_type` - The driver type to initialize. None will attempt to initialize every driver avaiable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// * `driver_type` - The driver type to initialize. None will attempt to initialize every driver avaiable
/// * `driver_type` - The driver type to initialize. None will attempt to initialize every driver available

}

/// Retrieve the number of VCPUs.
pub fn get_vcpu_count(&self) -> Result<u16, Box<dyn Error>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You basically implement the Introspectable trait without officially implementing the Introspectable trait. That's nasty. :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a lot of code duplication and unnecessary indirection. What's so wrong about making the drv public in order to allow direct access?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants